OptionsFile, Grammar, and Rule OptionsRather than have the programmer specify a bunch of command-line arguments to the parser generator, an options section within the grammar itself serves this purpose. This solution is preferrable because it associates the required options with the grammar rather than ANTLR invocation. The section is preceded by the options keyword and contains a series of option/value assignments surrounded by curly braces such as: options { k = 2; tokenVocbaulary = IDL; defaultErrorHandler = false; } The options section for an entire (.g) file, if specified, immediately follows the (optional) file header: header { package X; } options {language="FOO";} The options section for a grammar, if specified, must immediately follow the ';' of the class specifier: class MyParser extends Parser; The options section for a rule, if specified, must immediately follow the rule name: myrule[args] returns [retval] options { defaultErrorHandler=false; } : // body of rule... ; The option names are not keywords in ANTLR, but rather are entries in a symbol table examined by ANTLR. The scope of option names is limited to the options section; identifiers within your grammar may overlap with these symbols. The only ANTLR options not specified in the options section are things that do not vary with the grammar, but rather than invocation of ANTLR itself. The best example is debugging information. Typically, the programmer will want a makefile to change an ANTLR flag indicating a debug or release build. Options supported in ANTLRKey for the type column: F=file, G=grammar, R=rule, L=lexer, S=subrule.
language: Setting the generated languageANTLR supports multiple, installable code generators. Any code-generator conforming to the ANTLR specification may be invoked via the language option. The default language is "Java". The language option is specified at the file-level, for example: header { package zparse; } options { language="Java"; } ... classes follow ... k: Setting the lookahead depthYou may set the lookahead depth for any grammar (parser, lexer, or tree-walker), by using the k= option: class MyLexer extends Lexer; options { k=3; } ... Setting the lookahead depth changes the maximum number of tokens that will be examined to select alternative productions, and test for exit conditions of the EBNF constructs (...)?, (...)+, and (...)*. The lookahead analysis is linear approximate (as opposed to full LL(k) ). This is a bit involved to explain in detail, but consider this example with k=2: r : ( A B | B A ) | A A ; Full LL(k) analysis would resolve the ambiguity and produce a lookahead test for the first alternate like: if ( (LA(1)==A && LA(2)==B) || (LA(1)==B && LA(2)==A) ) However, linear approximate analysis would logically OR the lookahead sets at each depth, resulting in a test like: if ( (LA(1)==A || LA(1)==B) && (LA(2)==A || LA(2)==B) ) Which is ambiguous with the second alternate for {A,A}. Because of this, setting the lookahead depth very high tends to yield diminishing returns in most cases, because the lookahead sets at large depths will include almost everything. importVocab: Initial Grammar Vocabulary[See the documentation on vocabularies for more information] To specify an initial vocabulary (tokens, literals, and token types), use the importVocab grammar option. class MyParser extends Parser; ANTLR will look for VTokenTypes.txt in the current directory and preload the token manager for MyParser with the enclosed information. This option is useful, for example, if you create an external lexer and want to connect it to an ANTLR parser. Conversely, you may create an external parser and wish to use the token set with an ANTLR lexer. Finally, you may find it more convenient to place your grammars in separate files, especially if you have multiple tree-walkers that do not add any literals to the token set. The vocabulary file has an identifier on the first line that names the token vocabulary that is followed by lines of the form ID=value or "literal"=value. For example: ANTLR // vocabulary name "header"=3 ACTION=4 COLON=5 SEMI=6 ... A file of this form is automatically generated by ANTLR for each grammar. Note: you must take care to run ANTLR on the vocabulay-generating grammar files before you run ANTLR on the vocabulary-consuming grammar files. exportVocab: Naming Export Vocabulary[See the documentation on vocabularies for more information] The vocabulary of a grammar is the union of the set of tokens provided by an importVocab option and the set of tokens and literals defined in the grammar. ANTLR exports a vocabulary for each grammar whose default name is the same as the grammar. So, the following grammar yields a vocabulary called P: class P extends Parser; a : A; ANTLR generates files PTokenTypes.txt and PTokenTypes.java. You can specify the name of the exported vocabulary with the exportVocab option. The following grammar generates a vocabulary called V not P. class P extends Parser; options { exportVocab=V; } a : A; All grammars in the same file witht the same vocabulary name contribute to the same vocabulary (and resulting files). If the the grammars were in separate files, on the other hand, they would all overwrite the same file. For example, the following parser and lexer grammars both may contribute literals and tokens to the MyTokens vocabulary. class MyParser extends Parser; options { exportVocab=MyTokens; } ... class MyLexer extends Lexer; options { exportVocab=MyTokens; } ... testLiterals: Generate literal-testing codeBy default, ANTLR will generate code in all lexers to test each token against the literals table (the table generated for literal strings), and change the token type if it matches the table. However, you may suppress this code generation in the lexer by using a grammar option: class L extends Lexer; options { testLiterals=false; } ... If you turn this option off for a lexer, you may re-enable it for specific rules. This is useful, for example, if all literals are keywords, which are special cases of ID: ID options { testLiterals=true; } : LETTER (LETTER | DIGIT)* ; defaultErrorHandler: Controlling default exception-handlingBy default, ANTLR will generate default exception handling code for a parser or tree-parser rule. The generated code will catch any parser exceptions, synchronize to the follow set of the rule, and return. This is simple and often useful error-handling scheme, but it is not very sophisticated. Eventually, you will want to install your own exepttion handlers. ANTLR will automatically turn off generation of default exception handling for rule where an exception handler is specified. You may also explicitly control generation of default exception handling on a per-grammar or per-rule basis. For example, this will turn off default error-handing for the entire grammar, but turn it back on for rule "r": class P extends Parser; options {defaultErrorHandler=false;} r options {defaultErrorHandler=true;} : A B C; Note that the default error-handling for lexers is different. ANTLR will always generate exception handling for the synthesized nextToken rule, but will not generate default exception-handling for any other rule. You may add your own exception handling to other lexer rules, just make sure to catch ScannerException. codeGenMakeSwitchThreshold: controlling code generationANTLR will optimize lookahead tests by generating a switch statement instead of a series of if/else tests for rules containing a sufficiently large number of alternates whose lookahead is strictly LL(1). The option codeGenMakeSwitchThreshold controls this test. You may want to change this to control optimization of the parser. You may also want to disable it entirely for debugging purposes, by setting it to a large number: class P extends Parser; options { codeGenMakeSwitchThreshold=999; } ... codeGenBitsetTestThreshold: controlling code generationANTLR will optimize lookahead tests by generating a bitset test instead of an if statement, for very complex lookahead sets. The option codeGenBitsetTestThreshold controls this test. You may want to change this to control optimization of the parser: class P extends Parser; // make bitset if test involves five or more terms options { codeGenBitsetTestThreshold=5; } ... You may also want to disable it entirely for debugging purposes, by setting it to a large number: class P extends Parser; options { codeGenBitsetTestThreshold=999; } ... buildAST: Automatic AST constructionIn a Parser, you can tell ANTLR to generate code to construct ASTs corresponding to the structure of the recognized syntax. The option, if set to true, will cause ANTLR to generate AST-building code. With this option set, you can then use all of the AST-building syntax and support methods. In a Tree-Parser, this option turns on "transform mode", which means an output AST will be generated that is a transformation of the input AST. In a tree-walker, the default action of buildAST is to generate a copy of the portion of the input AST that is walked. Tree-transformation is almost identical to building an AST in a Parser, except that the input is an AST, not a stream of tokens. ASTLabelType: Setting label typeWhen you must define your own AST node type, your actions within the grammar will require lots of downcasting from AST (the default type of any user-defined label) to your tree node type; e.g., decl : d:ID {MyAST t=(MyAST)#d;} ; This makes your code a pain to type in and hard to read. To avoid this, use the grammar option ASTLabelType to have ANTLR automatically do casts and define labels of the appropriate type. class ExprParser extends Parser; options { buildAST=true; ASTLabelType = "MyAST"; } expr : a:term ; The type of #a within an action is MyAST not AST. charVocabulary: Setting the lexer character vocabularyANTLR processes Unicode. Because of this this, ANTLR cannot make any assumptions about the character set in use, else it would wind up generating huge lexers. Instead ANTLR assumes that the character literals, string literals, and character ranges used in the lexer constitute the entire character set of interest. For example, in this lexer: class L extends Lexer; A : 'a'; B : 'b'; DIGIT : '0' .. '9'; The implied character set is { 'a', 'b', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }. This can produce unexpected results if you assume that the normal ASCII character set is always used. For example, in: class L extends Lexer; A : 'a'; B : 'b'; DIGIT : '0' .. '9'; STRING: '"' (~'"")* '"'; The lexer rule STRING will only match strings containing 'a', 'b' and the digits, which is usually not what you want. To control the character set used by the lexer, use the charVocbaulary option. This example will use a general eight-bit character set. class L extends Lexer; options { charVocabulary = '\3'..'\377'; } ... This example uses the ASCII character set in conjunction with some values from the extended Unicode character set: class L extends Lexer; options { charVocabulary = '\3'..'\377' | '\u1000'..'\u1fff'; } ... warnWhenFollowAmbig[Warning: you should know what you are doing before you use this option. I deliberately made it a pain to shut warnings off (rather than a single character operator) so you would not just start turning off all the warnings. I thought for long time before implementing this exact mechanism. I recommend a comment in front of any use of this option that explains why it is ok to hush the warning.] This subrule option is true by default and controls the generation of nondeterminism (ambiguity) warnings when comparing the FOLLOW lookahead sets for any subrule with an empty alternative and any closure subrule such as (..)+ and (...)*. For example, the following simple rule has a nondeterministic subrule, which arises from a language ambiguity that you could attach an ELSE clause to the most recent IF or to an outer IF because the construct can nest. stat : "if" expr "then" stat ("else" stat)? | ID ASSIGN expr SEMI ; Because the language is ambiguous, the context-free grammar must be ambiguous and the resulting parser nondeterministic (in theory). However, being the practical language folks that we are, we all know you can trivially solve this problem by having ANTLR resolve conflicts by consuming input as soon as possible; I have yet to see a case where this was the wrong thing to do, by the way. This option, when set to false, merely informs ANTLR that it has made the correct assumption and can shut off an ambiguity related to this subrule and an empty alternative or exit path. Here is a version of the rule that does not yield a warning message:
stat : "if" expr "then" stat ( // standard if-then-else ambig options { warnWhenFollowAmbig=false; } : "else" stat )? | ID ASSIGN expr SEMI ; One important note: This option does not affect non-empty alternatives. For example, you will still get a warning for the following subrule between alts 1 and 3 (upon lookahead A): ( options { warnWhenFollowAmbig=false; } : A | B | A ) Further, this option is insensitive to lookahead. Only completely empty alternatives count as candidate alternatives for hushing warnings. So, at k=2, just because ANTLR can see past alternatives with single tokens, you still can get warnings. Command Line Options
|