Class Lexer
- java.lang.Object
-
- com.blackrook.expression.struct.Lexer
-
public class Lexer extends Object
Breaks up a stream of characters into lexicographical tokens. Spaces, newlines, tabs, comments, and breaks in the stream are added if desired, otherwise, they are stripped out.String delimiter characters take precedence over regular delimiters. Raw String delimiter characters take precedence over regular string delimiters. Delimiter characters take parsing priority over other characters. Delimiter evaluation priority goes: Comment Delimiter, Delimiter. Identifier evaluation priority goes: Keyword, CaseInsensitiveKeyword, Identifier.
Other implementations of this class may manipulate the stack as well (such as ones that do in-language stream inclusion).
If the system property
com.blackrook.expression.util.Lexer.debug
is set totrue
, this does debugging output toSystem.out
.Lexer functions are NOT thread-safe.
- Author:
- Matthew Tropiano
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Lexer.Kernel
This is a info kernel that tells aLexer
how to interpret certain characters and identifiers.static class
Lexer.Parser
Abstract parser class.static class
Lexer.ReaderStack
This holds a series ofReader
streams such that the stream on top is the current active stream.static class
Lexer.Token
Lexer token object.
-
Field Summary
Fields Modifier and Type Field Description static boolean
DEBUG
static char
END_OF_LEXER
Lexer end-of-stream char.static char
END_OF_STREAM
Lexer end-of-stream char.static char
NEWLINE
Lexer newline char.
-
Constructor Summary
Constructors Constructor Description Lexer(Lexer.Kernel kernel, Reader in)
Creates a new lexer around a reader.Lexer(Lexer.Kernel kernel, String in)
Creates a new lexer around a String, that will be wrapped into a StringReader.Lexer(Lexer.Kernel kernel, String name, Reader in)
Creates a new lexer around a reader.Lexer(Lexer.Kernel kernel, String name, String in)
Creates a new lexer around a String, that will be wrapped into a StringReader.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
clearCurrentLexeme()
Clears the current token lexeme buffer.protected String
getCurrentLexeme()
Gets the current token lexeme.int
getCurrentLineNumber()
Gets the lexer's current stream's line number.Lexer.ReaderStack.Stream
getCurrentStream()
Gets the current stream.String
getCurrentStreamName()
protected char
getRawStringEnd(char c)
Gets the character that ends a raw String, using the starting character.protected int
getState()
protected char
getStringEnd(char c)
Gets the character that ends a String, using the starting character.protected boolean
isCommentEndDelimiterStart(char c)
Checks if this is a (or the start of a) block-comment-ending delimiter character.protected boolean
isDelimiterStart(char c)
Checks if this is a (or the start of a) delimiter character.protected boolean
isDigit(char c)
Convenience method forCharacter.isDigit(char)
.protected boolean
isExponent(char c)
Checks if char is the exponent character in a number.protected boolean
isExponentSign(char c)
Checks if char is the exponent sign character in a number.protected boolean
isHexDigit(char c)
Returns true if this is a hex digit (0-9, A-F, a-f).protected boolean
isLetter(char c)
Convenience method forCharacter.isLetter(char)
.protected boolean
isLexerEnd(char c)
Checks if a char equalsEND_OF_LEXER
.protected boolean
isNewline(char c)
Checks if a char equalsNEWLINE
.protected boolean
isPoint(char c)
Checks if a character is a decimal point (depends on locale/kernel).protected boolean
isRawStringStart(char c)
Checks if this is a character that starts a multiline String.protected boolean
isSpace(char c)
Checks if a char is a space.protected boolean
isStreamEnd(char c)
Checks if a char equalsEND_OF_STREAM
.protected boolean
isStringEnd(char c)
Checks if this is a character that ends a String.protected boolean
isStringEscape(char c)
Checks if this is a character that is a String escape character.protected boolean
isStringStart(char c)
Checks if this is a character that starts a String.protected boolean
isTab(char c)
Checks if a char is a tab.protected boolean
isUnderscore(char c)
Convenience method forc == '_'
.protected boolean
isWhitespace(char c)
Convenience method forCharacter.isWhitespace(char)
.protected boolean
modifyType(Lexer.Token token)
Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.Lexer.Token
nextToken()
Gets the next token.void
pushStream(String name, Reader in)
Pushes a stream onto the encapsulated reader stack.protected char
readChar()
Reads a character from the stream.protected void
saveChar(char c)
Saves a character for the next token.protected void
setDelimBreak(char delimChar)
Sets if we are in a delimiter break.protected void
setMultilineStringStartAndEnd(char c)
Sets the end character for a string.protected void
setState(int state)
Sets the current state.protected void
setStringStartAndEnd(char c)
Sets the end character for a string.
-
-
-
Field Detail
-
DEBUG
public static boolean DEBUG
-
END_OF_LEXER
public static final char END_OF_LEXER
Lexer end-of-stream char.- See Also:
- Constant Field Values
-
END_OF_STREAM
public static final char END_OF_STREAM
Lexer end-of-stream char.- See Also:
- Constant Field Values
-
NEWLINE
public static final char NEWLINE
Lexer newline char.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
Lexer
public Lexer(Lexer.Kernel kernel, String in)
Creates a new lexer around a String, that will be wrapped into a StringReader. This will also assign this lexer a default name.- Parameters:
kernel
- the lexer kernel to use for defining how to parse the input text.in
- the string to read from.
-
Lexer
public Lexer(Lexer.Kernel kernel, String name, String in)
Creates a new lexer around a String, that will be wrapped into a StringReader.- Parameters:
kernel
- the lexer kernel to use for defining how to parse the input text.name
- the name of this lexer.in
- the reader to read from.
-
Lexer
public Lexer(Lexer.Kernel kernel, Reader in)
Creates a new lexer around a reader. This will also assign this lexer a default name.- Parameters:
kernel
- the kernel to use for this lexer.in
- the reader to read from.
-
Lexer
public Lexer(Lexer.Kernel kernel, String name, Reader in)
Creates a new lexer around a reader.- Parameters:
kernel
- the kernel to use for this lexer.name
- the name of this lexer.in
- the reader to read from.
-
-
Method Detail
-
getCurrentStreamName
public String getCurrentStreamName()
- Returns:
- the lexer's current stream name.
-
getCurrentLineNumber
public int getCurrentLineNumber()
Gets the lexer's current stream's line number.- Returns:
- the lexer's current stream's line number, or -1 if at Lexer end.
-
getCurrentStream
public Lexer.ReaderStack.Stream getCurrentStream()
Gets the current stream.- Returns:
- the name of the current stream.
-
pushStream
public void pushStream(String name, Reader in)
Pushes a stream onto the encapsulated reader stack.- Parameters:
name
- the name of the stream.in
- the reader reader.
-
nextToken
public Lexer.Token nextToken() throws IOException
Gets the next token. If there are no tokens left to read, this will return null. This method is NOT thread-safe!- Returns:
- the next token, or null if no more tokens to read.
- Throws:
IOException
- if a token cannot be read by the underlying Reader.
-
modifyType
protected boolean modifyType(Lexer.Token token)
Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.By default, this handles space, tab, newline, delimiter, and identifier.
If this method is overridden, this should have
if (super.modifyType(token)) return true;
right at the beginning.- Parameters:
token
- the original token.- Returns:
- true if the token's contents were changed, false if not.
-
readChar
protected char readChar() throws IOException
Reads a character from the stream.- Returns:
- the character read, or
END_OF_LEXER
if no more characters, orEND_OF_STREAM
if end of current stream. - Throws:
IOException
- if a character cannot be read.
-
getState
protected int getState()
- Returns:
- the current state.
-
setState
protected void setState(int state)
Sets the current state.- Parameters:
state
- the new state.
-
setDelimBreak
protected void setDelimBreak(char delimChar)
Sets if we are in a delimiter break.- Parameters:
delimChar
- the delimiter character that starts the break.
-
saveChar
protected void saveChar(char c)
Saves a character for the next token.- Parameters:
c
- the character to save into the current token.
-
setStringStartAndEnd
protected void setStringStartAndEnd(char c)
Sets the end character for a string.- Parameters:
c
- the character to set.
-
setMultilineStringStartAndEnd
protected void setMultilineStringStartAndEnd(char c)
Sets the end character for a string.- Parameters:
c
- the character to set.
-
getCurrentLexeme
protected String getCurrentLexeme()
Gets the current token lexeme.- Returns:
- the current contents of the token lexeme builder buffer.
-
clearCurrentLexeme
protected void clearCurrentLexeme()
Clears the current token lexeme buffer.
-
isUnderscore
protected boolean isUnderscore(char c)
Convenience method forc == '_'
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isLetter
protected boolean isLetter(char c)
Convenience method forCharacter.isLetter(char)
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isDigit
protected boolean isDigit(char c)
Convenience method forCharacter.isDigit(char)
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isHexDigit
protected boolean isHexDigit(char c)
Returns true if this is a hex digit (0-9, A-F, a-f).- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isWhitespace
protected boolean isWhitespace(char c)
Convenience method forCharacter.isWhitespace(char)
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isPoint
protected boolean isPoint(char c)
Checks if a character is a decimal point (depends on locale/kernel).- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isExponent
protected boolean isExponent(char c)
Checks if char is the exponent character in a number.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isExponentSign
protected boolean isExponentSign(char c)
Checks if char is the exponent sign character in a number.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isSpace
protected boolean isSpace(char c)
Checks if a char is a space.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isTab
protected boolean isTab(char c)
Checks if a char is a tab.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isStringEscape
protected boolean isStringEscape(char c)
Checks if this is a character that is a String escape character.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isStringStart
protected boolean isStringStart(char c)
Checks if this is a character that starts a String.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isRawStringStart
protected boolean isRawStringStart(char c)
Checks if this is a character that starts a multiline String.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isStringEnd
protected boolean isStringEnd(char c)
Checks if this is a character that ends a String.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
getStringEnd
protected char getStringEnd(char c)
Gets the character that ends a String, using the starting character.- Parameters:
c
- the starting character.- Returns:
- the corresponding end character, or the null character ('\0') if this does not end a string.
-
getRawStringEnd
protected char getRawStringEnd(char c)
Gets the character that ends a raw String, using the starting character.- Parameters:
c
- the starting character.- Returns:
- the corresponding end character, or the null character ('\0') if this does not end a multi-line string.
-
isDelimiterStart
protected boolean isDelimiterStart(char c)
Checks if this is a (or the start of a) delimiter character.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isCommentEndDelimiterStart
protected boolean isCommentEndDelimiterStart(char c)
Checks if this is a (or the start of a) block-comment-ending delimiter character.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isStreamEnd
protected boolean isStreamEnd(char c)
Checks if a char equalsEND_OF_STREAM
.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isLexerEnd
protected boolean isLexerEnd(char c)
Checks if a char equalsEND_OF_LEXER
.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isNewline
protected boolean isNewline(char c)
Checks if a char equalsNEWLINE
.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
-