Class Lexer

java.lang.Object
com.blackrook.json.struct.Lexer

public class Lexer extends Object
Breaks up a stream of characters into lexicographical tokens. Spaces, newlines, tabs, comments, and breaks in the stream are added if desired, otherwise, they are stripped out.

String delimiter characters take precedence over regular delimiters. Raw String delimiter characters take precedence over regular string delimiters. Delimiter characters take parsing priority over other characters. Delimiter evaluation priority goes: Comment Delimiter, Delimiter. Identifier evaluation priority goes: Keyword, CaseInsensitiveKeyword, Identifier.

Other implementations of this class may manipulate the stack as well (such as ones that do in-language stream inclusion).

If the system property com.blackrook.json.util.Lexer.debug is set to true, this does debugging output to System.out.

Lexer functions are NOT thread-safe.

Author:
Matthew Tropiano
  • Field Details

    • DEBUG

      public static boolean DEBUG
    • END_OF_LEXER

      public static final char END_OF_LEXER
      Lexer end-of-stream char.
      See Also:
    • END_OF_STREAM

      public static final char END_OF_STREAM
      Lexer end-of-stream char.
      See Also:
    • NEWLINE

      public static final char NEWLINE
      Lexer newline char.
      See Also:
  • Constructor Details

    • Lexer

      public Lexer(Lexer.Kernel kernel, String in)
      Creates a new lexer around a String, that will be wrapped into a StringReader. This will also assign this lexer a default name.
      Parameters:
      kernel - the lexer kernel to use for defining how to parse the input text.
      in - the string to read from.
    • Lexer

      public Lexer(Lexer.Kernel kernel, String name, String in)
      Creates a new lexer around a String, that will be wrapped into a StringReader.
      Parameters:
      kernel - the lexer kernel to use for defining how to parse the input text.
      name - the name of this lexer.
      in - the reader to read from.
    • Lexer

      public Lexer(Lexer.Kernel kernel, Reader in)
      Creates a new lexer around a reader. This will also assign this lexer a default name.
      Parameters:
      kernel - the kernel to use for this lexer.
      in - the reader to read from.
    • Lexer

      public Lexer(Lexer.Kernel kernel, String name, Reader in)
      Creates a new lexer around a reader.
      Parameters:
      kernel - the kernel to use for this lexer.
      name - the name of this lexer.
      in - the reader to read from.
  • Method Details

    • getCurrentStreamName

      public String getCurrentStreamName()
      Returns:
      the lexer's current stream name.
    • getCurrentLineNumber

      public int getCurrentLineNumber()
      Gets the lexer's current stream's line number.
      Returns:
      the lexer's current stream's line number, or -1 if at Lexer end.
    • getCurrentStream

      public Lexer.ReaderStack.Stream getCurrentStream()
      Gets the current stream.
      Returns:
      the name of the current stream.
    • pushStream

      public void pushStream(String name, Reader in)
      Pushes a stream onto the encapsulated reader stack.
      Parameters:
      name - the name of the stream.
      in - the reader reader.
    • nextToken

      public Lexer.Token nextToken() throws IOException
      Gets the next token. If there are no tokens left to read, this will return null. This method is NOT thread-safe!
      Returns:
      the next token, or null if no more tokens to read.
      Throws:
      IOException - if a token cannot be read by the underlying Reader.
    • modifyType

      protected boolean modifyType(Lexer.Token token)
      Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.

      By default, this handles space, tab, newline, delimiter, and identifier.

      If this method is overridden, this should have

       if (super.modifyType(token)) 
           return true;
       
      right at the beginning.
      Parameters:
      token - the original token.
      Returns:
      true if the token's contents were changed, false if not.
    • readChar

      protected char readChar() throws IOException
      Reads a character from the stream.
      Returns:
      the character read, or END_OF_LEXER if no more characters, or END_OF_STREAM if end of current stream.
      Throws:
      IOException - if a character cannot be read.
    • getState

      protected int getState()
      Returns:
      the current state.
    • setState

      protected void setState(int state)
      Sets the current state.
      Parameters:
      state - the new state.
    • setDelimBreak

      protected void setDelimBreak(char delimChar)
      Sets if we are in a delimiter break.
      Parameters:
      delimChar - the delimiter character that starts the break.
    • saveChar

      protected void saveChar(char c)
      Saves a character for the next token.
      Parameters:
      c - the character to save into the current token.
    • setStringStartAndEnd

      protected void setStringStartAndEnd(char c)
      Sets the end character for a string.
      Parameters:
      c - the character to set.
    • setMultilineStringStartAndEnd

      protected void setMultilineStringStartAndEnd(char c)
      Sets the end character for a string.
      Parameters:
      c - the character to set.
    • getCurrentLexeme

      protected String getCurrentLexeme()
      Gets the current token lexeme.
      Returns:
      the current contents of the token lexeme builder buffer.
    • clearCurrentLexeme

      protected void clearCurrentLexeme()
      Clears the current token lexeme buffer.
    • isUnderscore

      protected boolean isUnderscore(char c)
      Convenience method for c == '_'.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isLetter

      protected boolean isLetter(char c)
      Convenience method for Character.isLetter(char).
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isDigit

      protected boolean isDigit(char c)
      Convenience method for Character.isDigit(char).
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isHexDigit

      protected boolean isHexDigit(char c)
      Returns true if this is a hex digit (0-9, A-F, a-f).
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isWhitespace

      protected boolean isWhitespace(char c)
      Convenience method for Character.isWhitespace(char).
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isPoint

      protected boolean isPoint(char c)
      Checks if a character is a decimal point (depends on locale/kernel).
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isExponent

      protected boolean isExponent(char c)
      Checks if char is the exponent character in a number.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isExponentSign

      protected boolean isExponentSign(char c)
      Checks if char is the exponent sign character in a number.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isSpace

      protected boolean isSpace(char c)
      Checks if a char is a space.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isTab

      protected boolean isTab(char c)
      Checks if a char is a tab.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isStringEscape

      protected boolean isStringEscape(char c)
      Checks if this is a character that is a String escape character.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isStringStart

      protected boolean isStringStart(char c)
      Checks if this is a character that starts a String.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isRawStringStart

      protected boolean isRawStringStart(char c)
      Checks if this is a character that starts a multiline String.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • isStringEnd

      protected boolean isStringEnd(char c)
      Checks if this is a character that ends a String.
      Parameters:
      c - the character to test.
      Returns:
      true if so, false if not.
    • getStringEnd

      protected char getStringEnd(char c)
      Gets the character that ends a String, using the starting character.
      Parameters:
      c - the starting character.
      Returns:
      the corresponding end character, or the null character ('\0') if this does not end a string.
    • getRawStringEnd

      protected char getRawStringEnd(char c)
      Gets the character that ends a raw String, using the starting character.
      Parameters:
      c - the starting character.
      Returns:
      the corresponding end character, or the null character ('\0') if this does not end a multi-line string.
    • isDelimiterStart

      protected boolean isDelimiterStart(char c)
      Checks if this is a (or the start of a) delimiter character.
      Parameters:
      c - the character input.
      Returns:
      true if so, false if not.
    • isCommentEndDelimiterStart

      protected boolean isCommentEndDelimiterStart(char c)
      Checks if this is a (or the start of a) block-comment-ending delimiter character.
      Parameters:
      c - the character input.
      Returns:
      true if so, false if not.
    • isStreamEnd

      protected boolean isStreamEnd(char c)
      Checks if a char equals END_OF_STREAM.
      Parameters:
      c - the character input.
      Returns:
      true if so, false if not.
    • isLexerEnd

      protected boolean isLexerEnd(char c)
      Checks if a char equals END_OF_LEXER.
      Parameters:
      c - the character input.
      Returns:
      true if so, false if not.
    • isNewline

      protected boolean isNewline(char c)
      Checks if a char equals NEWLINE.
      Parameters:
      c - the character input.
      Returns:
      true if so, false if not.