If we detect recursion on more than one alt, decision is non-LL(*),
but try to isolate it to only those states whose closure operations
detect recursion.
If a closure operation finds that we tried to invoke the same
rule too many times (stack would grow beyond a threshold), it
marks the state has aborted and notifies the DecisionProbe.
The NFA->DFA algorithm may terminate leaving some states
without a path to an accept state, implying that upon certain
input, the decision is not deterministic--no decision about
predicting a unique alternative can be made.
Parse a rule we add artificially that is a list of the other lexer
rules like this: "Tokens : ID | INT | SEMI ;" nextToken() will invoke
this to set the current token.
Goal that picks up all the ANTLR grammars in a project and moves those that
are required for generation of the compilable sources into the location
that we use to compile them, such as target/generated-sources/antlr3 ...
The basic output templates without AST or templates stuff; this will be
the templates loaded for the language such as Java.stg *and* the Dbg
stuff if turned on.
Create a new node from an existing node does nothing for BaseTree
as there are no fields other than the children list, which cannot
be copied as the children are not considered part of this node.
A stripped-down version of org.antlr.misc.BitSet that is just
good enough to handle runtime requirements such as FOLLOW sets
for automatic error recovery.
BitSet() -
Constructor for class org.antlr.runtime.BitSet
A blank listener that does nothing; useful for real classes so
they don't have to have lots of blank methods and are less
sensitive to updates to debug interface.
From (A)+ build
|---| (Transition 2 from A.right points at alt 1)
v | (follow of loop is Transition 1)
o->o-A-o->o
Meaning that the last NFAState in A points back to A's left Transition NFAState
and we add a new begin/end NFAState.
From (A)* build
|---|
v |
o->o-A-o--o (Transition 2 from block end points at alt 1; follow is Transition 1)
| ^
o---------| (optional branch is 2nd alt of optional block containing A+)
Meaning that the last (end) NFAState in A points back to A's
left side NFAState and we add 3 new NFAStates (the
optional branch is built just like an optional subrule).
For reference to rule r, build
o-e->(r) o
where (r) is the start of rule r and the trailing o is not linked
to from rule ref state directly (it's done thru the transition(0)
RuleClosureTransition.
Checks to see if the list of outputFiles all exist, and have
last-modified timestamps which are later than the last-modified
timestamp of all the grammar files involved in build the output
(imports must be checked).
You can generate a switch rather than if-then-else for a DFA state
if there are no semantic predicates and the number of edge label
values is small enough; e.g., don't generate a switch for a state
containing an edge label such as 20..52330 (the resulting byte codes
would overflow the method 65k limit probably).
When walking ahead with cyclic DFA or for syntactic predicates,
we need to record the state of the input stream (char index,
line, etc...) so that we can rewind the state after scanning ahead.
For all NFA states (configurations) merged in d,
compute the epsilon closure; that is, find all NFA states reachable
from the NFA states in d via purely epsilon transitions.
The most common stream of tokens is one where every token is buffered up
and tokens are prefiltered for a certain channel (the parser will only
see these tokens and cannot change the filter channel number during the
parse).
A proxy debug event listener that forwards events over a socket to
a debugger (or any other listener) using a simple text-based protocol;
one event per line.
Is there a non-syn-pred predicate visible from s that is not in
the rule enclosing s? This accounts for most predicate situations
and lets ANTLR do a simple LL(1)+pred computation.
How long in ms did it take to build DFAs for this grammar?
If this grammar is a combined grammar, it only records time for
the parser grammar component.
Return the interval with elements from this not in other;
other must not be totally enclosed (properly contained)
within this, which would result in two disjoint intervals
instead of the single one returned by this method.
figure out if this state eventually reaches an accept state and
modify the instance variable 'reduced' to indicate if we find
at least one state that cannot reach an accept state.
The unique edge transition class number; every time we see a new
set of edges emanating from a state, we number it so we can reuse
if it's every seen again for another state.
Map an edge transition table to a unique set number; ordered so
we can push into the output template as an ordered list of sets
and then ref them from within the transition[][] table.
Rules in tree grammar that use -> rewrites and are spitting out
templates via output=template and then use rewrite=true must only
use -> on alts that are simple nodes or trees or single rule refs
that match either nodes or trees.
Are two IntervalSets equal? Because all intervals are sorted
and disjoint, equals is a simple linear walk over both lists
to make sure they are the same.
Before generating code, we examine all actions that can have
$x.y and $y stuff in them because some code generation depends on
Rule.referencedPredefinedRuleAttributes.
From this node, add a d--a-->t transition for all
labels 'a' where t is a DFA node created
from the set of NFA states reachable from any NFA
state in DFA state d.
Rule ref nodes, token refs, set, and NOT set refs need to track their
location in the generated NFA so that local FOLLOW sets can be
computed during code gen for automatic error recovery.
What index is this node in the child list? Range: 0..n-1
If your node type doesn't handle this, it's ok but the tree rewrites
in tree parsers need this functionality.
Walk each NFA configuration in this DFA state looking for a conflict
where (s|i|ctx) and (s|j|ctx) exist, indicating that state s with
context conflicting ctx predicts alts i and j.
What GrammarAST node (derived from the grammar) is this DFA
associated with? It will point to the start of a block or
the loop back of a (...)+ block etc...
Given a grammar type, what should be the default action scope?
If I say @members in a COMBINED grammar, for example, the
default scope should be "parser".
Get the set of Rules that need to have manual delegations
like "void rule() { importedGrammar.rule(); }"
If this grammar is master, get list of all rule definitions from all
delegate grammars.
Return a list of File objects that name files ANTLR will read
to process T.g; This can be .tokens files if the grammar uses the tokenVocab option
as well as any imported grammar files.
What error message should be generated for the various
exception types?
Not very object-oriented code, but I like having all error message
generation within one method rather than spread among all of the
exception classes.
For gated productions, we need an OR'd list of all predicates for the
target of an edge so we can gate the edge based upon the predicates
associated with taking that path (if any).
If a rule has no user-defined return values and nobody references
it's start/stop (predefined attributes), then there is no need to
define a struct; otherwise for now we assume a struct.
Interpret any ANTLR grammar:
java Interp file.g tokens-to-ignore start-rule input-file
java Interp C.g 'WS COMMENT' program t.c
where the WS and COMMENT are the names of tokens you want to have
the parser ignore.
A set of integers that relies on ranges being common to do
"run-length-encoded" like compression (if you view an IntSet like
a BitSet with runs of 0s and 1s).
A generic set of ints that has an efficient implementation, BitSet,
which is a compressed bitset and is useful for ints that
are small, for example less than 500 or so, and w/o many ranges.
Indicates whether ANTLR has supplied, or will supply, a list of all the things
that the input grammar depends upon and all the things that will be generated
when that grammar is successfully analyzed.
Indicates whether ANTLR has generated or will generate a version of the
recognizer that gathers statistics about its execution, which it prints when
it terminates.
Is the DFA reduced? I.e., does every state have a path to an accept
state? If not, don't delete as we need to generate an error indicating
which paths are "dead ends".
Indicates whether ANTLR has generated or will generate a report of various
elements of the grammar analysis, once it it has finished analyzing a grammar
file.
Indicates whether ANTLR will be verbose when analyzing grammar files, such as
displaying the names of the files it is generating and similar information.
Similar to LeftRecursionMessage except this is used for announcing
cycles found by walking rules without decisions; the other msg is
invoked when a decision DFA construction finds a problem in closure.
A list of CharStreamState objects that tracks the stream state
values line, charPositionInLine, and p that can change as you
move through the input stream.
Anything at this value or larger can be considered a simple atom int
for easy comparison during analysis only; faux labels are not used
during parse time for real token types or char values.
Reset the file and line information; useful when the grammar
has been generated so that errors are shown relative to the
original file like the old C preprocessor used to do.
We have labels like EPSILON that are below 0; it's hard to
store them in an array with negative index so use this
constant as an index shift when accessing arrays based upon
token type.
Given a pattern like (ASSIGN %lhs:ID %rhs:.) with optional labels
on the various nodes and '.' (dot) as the node/subtree wildcard,
return true if the pattern matches and fill the labels Map with
the labels pointing at the appropriate nodes.
This probe tells you a lot about a decision and is useful even
when there is no error such as when a syntactic nondeterminism
is solved via semantic predicates.
As this state is constructed (i.e., as NFA states are added), we
can easily check for non-epsilon transitions because the only
transition that could be a valid label is transition(0).
Lexers can normally match any char in it's vocabulary after matching
a token, so do the easy thing and just kill a character and hope
it all works out.
Given an NFA state number, how many times has the NFA-to-DFA
conversion pushed that state on the stack? In other words,
the NFA state must be a rule invocation state and this method
tells you how many times you've been to this state.
If set to true, then after the tool has processed an input grammar file
it will report variaous statistics about the parser, such as information
on cyclic DFAs, which rules may use backtracking, and so on.
Report the list of predicates found for each alternative; copy
the list because this set gets altered later by the method
tryToResolveWithSemanticPredicates() while flagging NFA configurations
in d as resolved.
Currently the analysis reports issues between token definitions, but
we don't print out warnings in favor of just picking the first token
definition found in the grammar ala lex/flex.
If > 1 NFA configurations within this DFA state have identical
NFA state and context, but differ in their predicted
TODO update for new context suffix stuff 3-9-2005
alternative then a single input sequence predicts multiple alts.
After an arbitrairly long lookahead as with a cyclic DFA (or with
any backtrack), this informs the debugger that stream should be
rewound to the position associated with marker.
Indicate whether ANTLR should supply a list of all the things
that the input grammar depends upon and all the things that will be generated
when that gramamr is successfully analyzed.
Used by build tools to force the output files to always be
relative to the base output directory, even though the tool
had to set the output directory to an absolute path as it
cannot rely on the workign directory like command line invocation
can.
Indicate whether the tool should analyze the dependencies of the provided grammar
file list and ensure that the grammars with dependencies are built
after any of the other gramamrs in the list that they are dependent on.
Where are the bounds in the input token stream for this node and
all children? Each rule that creates AST nodes will call this
method right before returning.
Indicate whether ANTLR should be verbose when analyzing grammar files, such as
displaying the names of the files it is generating and similar information.
Rather than add a new instance variable to NFA and DFA just for
serializing machines, map old state numbers to new state numbers
by a State object -> Integer new state number HashMap.
Was a syntactic ambiguity resolved with predicates? Any DFA
state that predicts more than one alternative, must be resolved
with predicates or it should be reported to the user.
The code generator for ANTLR can usually be retargeted just by providing
a new X.stg file for language X, however, sometimes the files that must
be generated vary enough that some X-specific functionality is required.
Target() -
Constructor for class org.antlr.codegen.Target
A source of tokens must provide a sequence of tokens via nextToken()
and also must reveal it's source of characters; CommonToken's text is
computed from a CharStream; it only store indices into the char stream.
A generic message from the tool such as "file not found" type errors; there
is no reason to create a special object for each error unlike the grammar
errors, which may be rather complex.
Predicates are lists of AST nodes from the NFA created from the
grammar, but the same predicate could be cut/paste into multiple
places in the grammar.
Because the user is not required to use a token with an index stored
in it, we must provide a means for two token objects themselves to
indicate the start/end location.
Cut-n-paste from material I'm not using in the book anymore (edit later
to make sense):
Now, how are we going to test these tree patterns against every
subtree in our original tree? In what order should we visit nodes?
For this application, it turns out we need a simple ``apply once''
rule application strategy and a ``down then up'' tree traversal
strategy.
If this parameter is set to true, then ANTLR will report all sorts of things
about what it is doing such as the names of files and the version of ANTLR and so on.