org.antlr.runtime
Class Lexer

java.lang.Object
  extended by org.antlr.runtime.BaseRecognizer
      extended by org.antlr.runtime.Lexer
All Implemented Interfaces:
TokenSource

public abstract class Lexer
extends BaseRecognizer
implements TokenSource

A lexer is recognizer that draws input symbols from a character stream. lexer grammars result in a subclass of this object. A Lexer object uses simplified match() and error recovery mechanisms in the interest of speed.


Field Summary
protected  CharStream input
          Where is the lexer drawing characters from?
 
Fields inherited from class org.antlr.runtime.BaseRecognizer
DEFAULT_TOKEN_CHANNEL, HIDDEN, INITIAL_FOLLOW_STACK_SIZE, MEMO_RULE_FAILED, MEMO_RULE_UNKNOWN, NEXT_TOKEN_RULE_NAME, state
 
Constructor Summary
Lexer()
           
Lexer(CharStream input)
           
Lexer(CharStream input, RecognizerSharedState state)
           
 
Method Summary
 Token emit()
          The standard method called to automatically emit a token at the outermost lexical rule.
 void emit(Token token)
          Currently does not support multiple emits per nextToken invocation for efficiency reasons.
 java.lang.String getCharErrorDisplay(int c)
           
 int getCharIndex()
          What is the index of the current character of lookahead?
 int getCharPositionInLine()
           
 CharStream getCharStream()
           
 java.lang.String getErrorMessage(RecognitionException e, java.lang.String[] tokenNames)
          What error message should be generated for the various exception types? Not very object-oriented code, but I like having all error message generation within one method rather than spread among all of the exception classes.
 int getLine()
           
 java.lang.String getSourceName()
          Where are you getting tokens from? normally the implication will simply ask lexers input stream.
 java.lang.String getText()
          Return the text matched so far for the current token or any text override.
 void match(int c)
           
 void match(java.lang.String s)
           
 void matchAny()
           
 void matchRange(int a, int b)
           
abstract  void mTokens()
          This is the lexer entry point that sets instance var 'token'
 Token nextToken()
          Return a token from this source; i.e., match a token on the char stream.
 void recover(RecognitionException re)
          Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out.
 void reportError(RecognitionException e)
          Report a recognition problem.
 void reset()
          reset the parser's state; subclasses must rewinds the input stream
 void setCharStream(CharStream input)
          Set the char stream and reset the lexer
 void setText(java.lang.String text)
          Set the complete text of this token; it wipes any previous changes to the text.
 void skip()
          Instruct the lexer to skip creating a token for current lexer rule and look for another token.
 void traceIn(java.lang.String ruleName, int ruleIndex)
           
 void traceOut(java.lang.String ruleName, int ruleIndex)
           
 
Methods inherited from class org.antlr.runtime.BaseRecognizer
alreadyParsedRule, beginResync, combineFollows, computeContextSensitiveRuleFOLLOW, computeErrorRecoverySet, consumeUntil, consumeUntil, displayRecognitionError, emitErrorMessage, endResync, failed, getBacktrackingLevel, getCurrentInputSymbol, getErrorHeader, getGrammarFileName, getMissingSymbol, getNumberOfSyntaxErrors, getRuleInvocationStack, getRuleInvocationStack, getRuleMemoization, getRuleMemoizationCacheSize, getTokenErrorDisplay, getTokenNames, match, matchAny, memoize, mismatchIsMissingToken, mismatchIsUnwantedToken, pushFollow, recover, recoverFromMismatchedSet, recoverFromMismatchedToken, setBacktrackingLevel, toStrings, traceIn, traceOut
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

input

protected CharStream input
Where is the lexer drawing characters from?

Constructor Detail

Lexer

public Lexer()

Lexer

public Lexer(CharStream input)

Lexer

public Lexer(CharStream input,
             RecognizerSharedState state)
Method Detail

reset

public void reset()
Description copied from class: BaseRecognizer
reset the parser's state; subclasses must rewinds the input stream

Overrides:
reset in class BaseRecognizer

nextToken

public Token nextToken()
Return a token from this source; i.e., match a token on the char stream.

Specified by:
nextToken in interface TokenSource

skip

public void skip()
Instruct the lexer to skip creating a token for current lexer rule and look for another token. nextToken() knows to keep looking when a lexer rule finishes with token set to SKIP_TOKEN. Recall that if token==null at end of any token rule, it creates one for you and emits it.


mTokens

public abstract void mTokens()
                      throws RecognitionException
This is the lexer entry point that sets instance var 'token'

Throws:
RecognitionException

setCharStream

public void setCharStream(CharStream input)
Set the char stream and reset the lexer


getCharStream

public CharStream getCharStream()

getSourceName

public java.lang.String getSourceName()
Description copied from interface: TokenSource
Where are you getting tokens from? normally the implication will simply ask lexers input stream.

Specified by:
getSourceName in interface TokenSource
Specified by:
getSourceName in class BaseRecognizer

emit

public void emit(Token token)
Currently does not support multiple emits per nextToken invocation for efficiency reasons. Subclass and override this method and nextToken (to push tokens into a list and pull from that list rather than a single variable as this implementation does).


emit

public Token emit()
The standard method called to automatically emit a token at the outermost lexical rule. The token object should point into the char buffer start..stop. If there is a text override in 'text', use that to set the token's text. Override this method to emit custom Token objects. If you are building trees, then you should also override Parser or TreeParser.getMissingSymbol().


match

public void match(java.lang.String s)
           throws MismatchedTokenException
Throws:
MismatchedTokenException

matchAny

public void matchAny()

match

public void match(int c)
           throws MismatchedTokenException
Throws:
MismatchedTokenException

matchRange

public void matchRange(int a,
                       int b)
                throws MismatchedRangeException
Throws:
MismatchedRangeException

getLine

public int getLine()

getCharPositionInLine

public int getCharPositionInLine()

getCharIndex

public int getCharIndex()
What is the index of the current character of lookahead?


getText

public java.lang.String getText()
Return the text matched so far for the current token or any text override.


setText

public void setText(java.lang.String text)
Set the complete text of this token; it wipes any previous changes to the text.


reportError

public void reportError(RecognitionException e)
Description copied from class: BaseRecognizer
Report a recognition problem. This method sets errorRecovery to indicate the parser is recovering not parsing. Once in recovery mode, no errors are generated. To get out of recovery mode, the parser must successfully match a token (after a resync). So it will go: 1. error occurs 2. enter recovery mode, report error 3. consume until token found in resynch set 4. try to resume parsing 5. next match() will reset errorRecovery mode If you override, make sure to update syntaxErrors if you care about that.

Overrides:
reportError in class BaseRecognizer

getErrorMessage

public java.lang.String getErrorMessage(RecognitionException e,
                                        java.lang.String[] tokenNames)
Description copied from class: BaseRecognizer
What error message should be generated for the various exception types? Not very object-oriented code, but I like having all error message generation within one method rather than spread among all of the exception classes. This also makes it much easier for the exception handling because the exception classes do not have to have pointers back to this object to access utility routines and so on. Also, changing the message for an exception type would be difficult because you would have to subclassing exception, but then somehow get ANTLR to make those kinds of exception objects instead of the default. This looks weird, but trust me--it makes the most sense in terms of flexibility. For grammar debugging, you will want to override this to add more information such as the stack frame with getRuleInvocationStack(e, this.getClass().getName()) and, for no viable alts, the decision description and state etc... Override this to change the message generated for one or more exception types.

Overrides:
getErrorMessage in class BaseRecognizer

getCharErrorDisplay

public java.lang.String getCharErrorDisplay(int c)

recover

public void recover(RecognitionException re)
Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out. You can instead use the rule invocation stack to do sophisticated error recovery if you are in a fragment rule.


traceIn

public void traceIn(java.lang.String ruleName,
                    int ruleIndex)

traceOut

public void traceOut(java.lang.String ruleName,
                     int ruleIndex)


Copyright © 2011. All Rights Reserved.