chapter 2 Flashcards

1
Q

what is The role of the lexical analyzer?

A

is to read a sequence of characters from the source program and produce tokens to be used by the parser.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The stream of tokens is sent to the parser for

A

syntax analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The lexical analyzer also interacts with the
symbol table

A

t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The lexical analyzer can also performs … secondary tasks:

A

– stripping out blanks, tabs, new lines
– stripping out comments
– keeping track of line numbers
– Expanding macros in some lexical analyzers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

… is a rule describing the set of lexemes
that represent a token.

A

pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Patterns are usually specified using …

A

regular expressions
For example, the pattern [a-zA-Z]*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a A lexeme?

A

A lexeme is a sequence of characters in the source code that matches the pattern for a specific token.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

x, distance, count ➔ IDENT. find the token and lexemes.

A
  1. Token: IDENT (Identifier)
    - IDENT represents a category of tokens in the programming language.
    - It identifies variables, functions, or other entities in the code.
  2. Lexemes:
    - x: represents a variable or some other named entity in the code.
  • distance
  • count
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

begin : give lexemes from this token

A

begin, Begin, BEGIN, beGin,
- Begin in small or capital letters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

list common programming tokens

A

– keywords
– operators
– identifiers
– constants
– literals
– punctuation symbols

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are attributes

A

Attributes of tokens are additional pieces of information associated with lexemes that match a particular pattern and are classified as a specific token type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

… provide details about the specific lexeme and its context in the source code.

A

attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

break down the tokens and their attributes : x = y + 2

A

<id, pointer to symbol-table entry for x>

<assign_op, >

<id, pointer to symbol-table entry for y>

<plus_op, >

<num, integer value 2>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

… contains information about the token such as the lexeme, the line number in which it was first seen …

A

Symbol table entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

break down the tokens and their attributes : E = M * C ** 2

A

<id, pointer to symbol-table entry for E>
<assign_op, >
<id, pointer to symbol-table entry for M>
<mult_op, >
<id, pointer to symbol-table entry for C>
<exp_op, >
<num, integer value 2>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the programmer mistakes wihle for while,
the lexical analyzer cannot detect the error (why?)

A
  • a valid token is produced but with unintended meaning
17
Q

Types of Lexical Errors?

A
  • Illegal Characters: If the source code contains characters that do not belong to any specified token pattern. For example, if the source code contains a ‘?’ symbol but there’s no pattern defined to recognize it as a valid token.
  • Exceeding Length: Errors can occur if identifiers or numeric constants exceed the maximum allowable length defined by the language specification.
  • Unterminated Strings or Comments: Errors are detected if strings or comments in the source code are not properly terminated.
18
Q

how lexical errors can be handled?

A
  • Panic Mode Recovery
    – deleting extraneous characters
    – inserting missing characters
    – replacing an incorrect character by a correct character
    – transposing two adjacent characters
19
Q

explain panic mode recovery

A

In panic mode recovery, the lexical analyzer attempts to recover by skipping (deleting) successive characters from the remaining input until it finds a well-formed token.

20
Q

… are commonly used to specify the patterns of tokens in programming languages.

A

Regular expressions

21
Q

define a regular expression for identifiers where an identifier starts with a letter (uppercase or lowercase) followed by zero or more letters or digits.

A

letter → [a-zA-Z]
digit → [0-9]
identifier → letter (letter | digit)*

22
Q

Regular expressions (RE) are defined using an alphabet of terminal symbols and three fundamental operations: …

A

alternation, concatenation, and repetition.

23
Q

explain Alternation (|)

A
  • allows us to specify alternatives between different patterns.
  • denoted by the vertical bar | and represents a choice between two or more patterns.
  • For example : (a|b) matches either the character ‘a’ or the character ‘b’.
24
Q

explain The concatenation operation

A
  • combines two patterns sequentially.
    It represents the sequential occurrence of one pattern followed by another.
  • denoted by simply juxtaposing two regular expressions.
  • Example : (ab) matches the sequence of characters ‘a’ followed by ‘b’.
25
Q
A
  • The repetition operation specifies that a pattern can occur zero or more times consecutively.
  • denoted by appending an asterisk * to the regular expression.
  • Example: (a)* matches zero or more occurrences of the character ‘a’.
26
Q

explain Positive Closure (+)

A
  • It denotes one or more instances of the pattern.
  • The positive closure r+ is equivalent to rr*, which means one occurrence of r followed by zero or more occurrences of r.
27
Q

explain Zero or One Instance (?)

A
  • indicates “zero or one occurrence” of the pattern.
  • The operator ? is equivalent to r | λ, where λ represents the empty string.
28
Q

explain Character Classes

A
  • They allow you to specify a range or list of characters within square brackets [ ].
  • [abc] represents the set of characters ‘a’, ‘b’, or ‘c’. (Alternation)
  • [a-z] represents any lowercase letter from ‘a’ to ‘z’. (Alternation)
29
Q

explain Any Character The dot . (period)?

A
  • matches any single character except a newline (\n).
  • Example:
    In the regular expression a.b, the dot . matches any single character, so a.b would match strings such as :
  • “aab”,
  • “a1b”, or
  • “a@b”,

-where the dot . represents any character between ‘a’ and ‘b’.