CHAPTER 2 Flashcards
What is a formal language?
is a set of strings constructed from an alphabet following specific rules (a grammar).
What is a regular expression?
is a pattern describing a set of strings using characters and operators.
- What is meant by an ambiguous lexical definition?
When a single input can be tokenized in multiple valid ways.
- Typical examples of ambiguities and how to resolve them:
Keyword vs. Identifier: “if” could be a keyword or part of a variable name “ifx”.
Solution: Use longest match (prefer “if” over “ifx”).
Floating-point vs. Integer: 3.14 could be INT (3) followed by . and 14, or FLOAT (3.14).
Solution: Define FLOAT → [0-9]+.[0-9]+ with higher priority than INT.
Operators vs. Multi-character symbols:»_space;= vs.»_space; and =.
Solution: Define»_space;= before»_space; in lexical rules.
What is a lexical action?
An action executed when a token is matched by the lexer.
Example of constructing an NFA for a lexical definition:
Regex: (a|b)*ab
NFA Construction:
Start with an ε-transition for a|b.
Use multiple paths for a and b.
Ensure at least one ab is at the end.
scss
Copy
Edit
(ε) → (a) → (b) → (final)
(ε) → (b) → (a) → (final)
How is rule priority handled in the implementation?
Longest match: The lexer prefers the longest possible token match.
EOF (End of File): The lexer returns an EOF token when the input is exhausted.
Whitespace: Often skipped unless meaningful (e.g., indentation in Python).
Errors: Unrecognized characters trigger a lexical error.
What are lexical states? When are they useful?
Lexical states allow the lexer to behave differently depending on the current context.
Useful for:
Handling multi-line comments (/* … /) – Switch to a comment state inside / and return to normal when */ is found.
Processing string literals – Inside “…”, different rules apply.
Distinguishing language modes – Example: inside a template string in JavaScript.