Class Lexer

  • Direct Known Subclasses:
    PreprocessorLexer

    public class Lexer
    extends Object
    Breaks up a stream of characters into lexicographical tokens. Spaces, newlines, tabs, comments, and breaks in the stream are added if desired, otherwise, they are stripped out.

    String delimiter characters take precedence over regular delimiters. Raw String delimiter characters take precedence over regular string delimiters. Delimiter characters take parsing priority over other characters. Delimiter evaluation priority goes: Comment Delimiter, Delimiter. Identifier evaluation priority goes: Keyword, CaseInsensitiveKeyword, Identifier.

    Other implementations of this class may manipulate the stack as well (such as ones that do in-language stream inclusion).

    If the system property com.blackrook.archetext.util.Lexer.debug is set to true, this does debugging output to System.out.

    Lexer functions are NOT thread-safe.

    Author:
    Matthew Tropiano
    • Field Detail

      • DEBUG

        public static boolean DEBUG
      • END_OF_LEXER

        public static final char END_OF_LEXER
        Lexer end-of-stream char.
        See Also:
        Constant Field Values
      • END_OF_STREAM

        public static final char END_OF_STREAM
        Lexer end-of-stream char.
        See Also:
        Constant Field Values
    • Constructor Detail

      • Lexer

        public Lexer​(Lexer.Kernel kernel,
                     String in)
        Creates a new lexer around a String, that will be wrapped into a StringReader. This will also assign this lexer a default name.
        Parameters:
        kernel - the lexer kernel to use for defining how to parse the input text.
        in - the string to read from.
      • Lexer

        public Lexer​(Lexer.Kernel kernel,
                     String name,
                     String in)
        Creates a new lexer around a String, that will be wrapped into a StringReader.
        Parameters:
        kernel - the lexer kernel to use for defining how to parse the input text.
        name - the name of this lexer.
        in - the reader to read from.
      • Lexer

        public Lexer​(Lexer.Kernel kernel,
                     Reader in)
        Creates a new lexer around a reader. This will also assign this lexer a default name.
        Parameters:
        kernel - the kernel to use for this lexer.
        in - the reader to read from.
      • Lexer

        public Lexer​(Lexer.Kernel kernel,
                     String name,
                     Reader in)
        Creates a new lexer around a reader.
        Parameters:
        kernel - the kernel to use for this lexer.
        name - the name of this lexer.
        in - the reader to read from.
    • Method Detail

      • getCurrentStreamName

        public String getCurrentStreamName()
        Returns:
        the lexer's current stream name.
      • getCurrentLineNumber

        public int getCurrentLineNumber()
        Gets the lexer's current stream's line number.
        Returns:
        the lexer's current stream's line number, or -1 if at Lexer end.
      • getCurrentStream

        public Lexer.ReaderStack.Stream getCurrentStream()
        Gets the current stream.
        Returns:
        the name of the current stream.
      • pushStream

        public void pushStream​(String name,
                               Reader in)
        Pushes a stream onto the encapsulated reader stack.
        Parameters:
        name - the name of the stream.
        in - the reader reader.
      • nextToken

        public Lexer.Token nextToken()
                              throws IOException
        Gets the next token. If there are no tokens left to read, this will return null. This method is NOT thread-safe!
        Returns:
        the next token, or null if no more tokens to read.
        Throws:
        IOException - if a token cannot be read by the underlying Reader.
      • modifyType

        protected boolean modifyType​(Lexer.Token token)
        Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.

        By default, this handles space, tab, newline, delimiter, and identifier.

        If this method is overridden, this should have

         if (super.modifyType(token)) 
             return true;
         
        right at the beginning.
        Parameters:
        token - the original token.
        Returns:
        true if the token's contents were changed, false if not.
      • readChar

        protected char readChar()
                         throws IOException
        Reads a character from the stream.
        Returns:
        the character read, or END_OF_LEXER if no more characters, or END_OF_STREAM if end of current stream.
        Throws:
        IOException - if a character cannot be read.
      • getState

        protected int getState()
        Returns:
        the current state.
      • setState

        protected void setState​(int state)
        Sets the current state.
        Parameters:
        state - the new state.
      • setDelimBreak

        protected void setDelimBreak​(char delimChar)
        Sets if we are in a delimiter break.
        Parameters:
        delimChar - the delimiter character that starts the break.
      • saveChar

        protected void saveChar​(char c)
        Saves a character for the next token.
        Parameters:
        c - the character to save into the current token.
      • setStringStartAndEnd

        protected void setStringStartAndEnd​(char c)
        Sets the end character for a string.
        Parameters:
        c - the character to set.
      • setMultilineStringStartAndEnd

        protected void setMultilineStringStartAndEnd​(char c)
        Sets the end character for a string.
        Parameters:
        c - the character to set.
      • getCurrentLexeme

        protected String getCurrentLexeme()
        Gets the current token lexeme.
        Returns:
        the current contents of the token lexeme builder buffer.
      • clearCurrentLexeme

        protected void clearCurrentLexeme()
        Clears the current token lexeme buffer.
      • isUnderscore

        protected boolean isUnderscore​(char c)
        Convenience method for c == '_'.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isLetter

        protected boolean isLetter​(char c)
        Convenience method for Character.isLetter(char).
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isDigit

        protected boolean isDigit​(char c)
        Convenience method for Character.isDigit(char).
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isHexDigit

        protected boolean isHexDigit​(char c)
        Returns true if this is a hex digit (0-9, A-F, a-f).
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isWhitespace

        protected boolean isWhitespace​(char c)
        Convenience method for Character.isWhitespace(char).
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isPoint

        protected boolean isPoint​(char c)
        Checks if a character is a decimal point (depends on locale/kernel).
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isExponent

        protected boolean isExponent​(char c)
        Checks if char is the exponent character in a number.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isExponentSign

        protected boolean isExponentSign​(char c)
        Checks if char is the exponent sign character in a number.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isSpace

        protected boolean isSpace​(char c)
        Checks if a char is a space.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isTab

        protected boolean isTab​(char c)
        Checks if a char is a tab.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isStringEscape

        protected boolean isStringEscape​(char c)
        Checks if this is a character that is a String escape character.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isStringStart

        protected boolean isStringStart​(char c)
        Checks if this is a character that starts a String.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isRawStringStart

        protected boolean isRawStringStart​(char c)
        Checks if this is a character that starts a multiline String.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • isStringEnd

        protected boolean isStringEnd​(char c)
        Checks if this is a character that ends a String.
        Parameters:
        c - the character to test.
        Returns:
        true if so, false if not.
      • getStringEnd

        protected char getStringEnd​(char c)
        Gets the character that ends a String, using the starting character.
        Parameters:
        c - the starting character.
        Returns:
        the corresponding end character, or the null character ('\0') if this does not end a string.
      • getRawStringEnd

        protected char getRawStringEnd​(char c)
        Gets the character that ends a raw String, using the starting character.
        Parameters:
        c - the starting character.
        Returns:
        the corresponding end character, or the null character ('\0') if this does not end a multi-line string.
      • isDelimiterStart

        protected boolean isDelimiterStart​(char c)
        Checks if this is a (or the start of a) delimiter character.
        Parameters:
        c - the character input.
        Returns:
        true if so, false if not.
      • isCommentEndDelimiterStart

        protected boolean isCommentEndDelimiterStart​(char c)
        Checks if this is a (or the start of a) block-comment-ending delimiter character.
        Parameters:
        c - the character input.
        Returns:
        true if so, false if not.
      • isStreamEnd

        protected boolean isStreamEnd​(char c)
        Checks if a char equals END_OF_STREAM.
        Parameters:
        c - the character input.
        Returns:
        true if so, false if not.
      • isLexerEnd

        protected boolean isLexerEnd​(char c)
        Checks if a char equals END_OF_LEXER.
        Parameters:
        c - the character input.
        Returns:
        true if so, false if not.
      • isNewline

        protected boolean isNewline​(char c)
        Checks if a char equals NEWLINE.
        Parameters:
        c - the character input.
        Returns:
        true if so, false if not.