chapter 2 Flashcards by Jane Austin

what is The role of the lexical analyzer?

is to read a sequence of characters from the source program and produce tokens to be used by the parser.

How well did you know this?

Not at all

Perfectly

The stream of tokens is sent to the parser for
…

syntax analysis

How well did you know this?

Not at all

Perfectly

The lexical analyzer also interacts with the
symbol table

How well did you know this?

Not at all

Perfectly

The lexical analyzer can also performs … secondary tasks:

– stripping out blanks, tabs, new lines
– stripping out comments
– keeping track of line numbers
– Expanding macros in some lexical analyzers

How well did you know this?

Not at all

Perfectly

… is a rule describing the set of lexemes
that represent a token.

pattern

How well did you know this?

Not at all

Perfectly

Patterns are usually specified using …

regular expressions
For example, the pattern [a-zA-Z]*

How well did you know this?

Not at all

Perfectly

what is a A lexeme?

A lexeme is a sequence of characters in the source code that matches the pattern for a specific token.

How well did you know this?

Not at all

Perfectly

x, distance, count ➔ IDENT. find the token and lexemes.

Token: IDENT (Identifier)
- IDENT represents a category of tokens in the programming language.
- It identifies variables, functions, or other entities in the code.
Lexemes:
- x: represents a variable or some other named entity in the code.

distance
count

How well did you know this?

Not at all

Perfectly

begin : give lexemes from this token

begin, Begin, BEGIN, beGin,
- Begin in small or capital letters

How well did you know this?

Not at all

Perfectly

list common programming tokens

– keywords
– operators
– identifiers
– constants
– literals
– punctuation symbols

How well did you know this?

Not at all

Perfectly

what are attributes

Attributes of tokens are additional pieces of information associated with lexemes that match a particular pattern and are classified as a specific token type.

How well did you know this?

Not at all

Perfectly

… provide details about the specific lexeme and its context in the source code.

attributes

How well did you know this?

Not at all

Perfectly

break down the tokens and their attributes : x = y + 2

<id, pointer to symbol-table entry for x>

<assign_op, >

<id, pointer to symbol-table entry for y>

<plus_op, >

<num, integer value 2>

How well did you know this?

Not at all

Perfectly

… contains information about the token such as the lexeme, the line number in which it was first seen …

Symbol table entry

How well did you know this?

Not at all

Perfectly

break down the tokens and their attributes : E = M * C ** 2

<id, pointer to symbol-table entry for E>
<assign_op, >
<id, pointer to symbol-table entry for M>
<mult_op, >
<id, pointer to symbol-table entry for C>
<exp_op, >
<num, integer value 2>

How well did you know this?

Not at all

Perfectly

If the programmer mistakes wihle for while,
the lexical analyzer cannot detect the error (why?)

Study These Flashcards

a valid token is produced but with unintended meaning

Types of Lexical Errors?

Study These Flashcards

Illegal Characters: If the source code contains characters that do not belong to any specified token pattern. For example, if the source code contains a ‘?’ symbol but there’s no pattern defined to recognize it as a valid token.
Exceeding Length: Errors can occur if identifiers or numeric constants exceed the maximum allowable length defined by the language specification.
Unterminated Strings or Comments: Errors are detected if strings or comments in the source code are not properly terminated.

how lexical errors can be handled?

Study These Flashcards

Panic Mode Recovery
– deleting extraneous characters
– inserting missing characters
– replacing an incorrect character by a correct character
– transposing two adjacent characters

explain panic mode recovery

Study These Flashcards

In panic mode recovery, the lexical analyzer attempts to recover by skipping (deleting) successive characters from the remaining input until it finds a well-formed token.

… are commonly used to specify the patterns of tokens in programming languages.

Study These Flashcards

Regular expressions

define a regular expression for identifiers where an identifier starts with a letter (uppercase or lowercase) followed by zero or more letters or digits.

Study These Flashcards

letter → [a-zA-Z]
digit → [0-9]
identifier → letter (letter | digit)*

Regular expressions (RE) are defined using an alphabet of terminal symbols and three fundamental operations: …

Study These Flashcards

alternation, concatenation, and repetition.

explain Alternation (|)

Study These Flashcards

allows us to specify alternatives between different patterns.
denoted by the vertical bar | and represents a choice between two or more patterns.
For example : (a|b) matches either the character ‘a’ or the character ‘b’.

explain The concatenation operation

Study These Flashcards

combines two patterns sequentially.
It represents the sequential occurrence of one pattern followed by another.
denoted by simply juxtaposing two regular expressions.
Example : (ab) matches the sequence of characters ‘a’ followed by ‘b’.

- The repetition operation specifies that a pattern can occur zero or more times consecutively. - denoted by appending an asterisk * to the regular expression. - Example: (a)* matches zero or more occurrences of the character 'a'.

explain Positive Closure (+)

- It denotes one or more instances of the pattern. - The positive closure r+ is equivalent to rr*, which means one occurrence of r followed by zero or more occurrences of r.

explain Zero or One Instance (?)

- indicates "zero or one occurrence" of the pattern. - The operator ? is equivalent to r | λ, where λ represents the empty string.

explain Character Classes

- They allow you to specify a range or list of characters within square brackets [ ]. - [abc] represents the set of characters 'a', 'b', or 'c'. (Alternation) - [a-z] represents any lowercase letter from 'a' to 'z'. (Alternation)

explain Any Character The dot . (period)?

- matches any single character except a newline (\n). - Example: In the regular expression a.b, the dot . matches any single character, so a.b would match strings such as : - "aab", - "a1b", or - "a@b", -where the dot . represents any character between 'a' and 'b'.

chapter 2 Flashcards

(29 cards)