Mod 2.2 (lexical analyzer) Flashcards
tasks performed by the lexical analyzer
- identification of lexemes
- identifying and removal of any extra whitespaces and stripping out comment lines
- corelating the error messages with the source program
- expanding the macros in the program
what is a token
a terminal symbol in the grammar for the source language
it is an abstract symbol representing a kind of lexical unit
what is a pattern
a rule that describe a set of lexemes that can represent a particular token in the source program
a pattern in a description of the form that lexemes of a token might take
what is a lexeme
it is a sequence of characters in the source program that is matched by the pattern for a token
it is a sequence of characters and is identified by the lexical analyzer as an instance of that token
what is input buffering
lexical analyzer is the only phase of the compiler that reads the source program. since it reads the program character by character, the speed of this operation needs to be considered.
it also needs to look at one or more characters beyond the next lexeme before we have the right lexeme
Buffer pairs
the buffer consists of two pointers
- beginning of lexeme pointer
- forward pointer
both point to the first character of the next lexeme to be found
forward pointer scans at least until a match for a pattern is found
if the forward pointer moves past the halfway mark, then the other half of the buffer is filled with new characters
Sentinels
when the forward pointer is moved to a new input, it needs to check for two conditions: if it is the end of the buffer and to determine what character is read
we can combine these two test into a single test if we extend each buffer to hold a “eof” sentinel character at the end