Compiler Design Flashcards
Using the appropriate conditions on the grammar productions, explain when a grammar is context sensitive, when it is context free, and when it is a regular grammar.
• Context sensitive grammars -> every production is of the form a -> b where |a| <= |b|
• Context free grammars -> they are context sensitive grammars, where every production has in the left part one non-terminal A -> BaA
Regular grammars -> they are context-free grammars, where all productions have on of the forms: A -> a or A -> aB
Explain what is a synthesized attribute in a syntax-directed definition (SDD)
A synthesized attribute in an SDD is an attribute whose value at a node N is defined in terms of attributes at the children of N and at N itself.
What is a compiler and a interpreter?
- Compiler -> A computer program that transforms a program written in a programming language (source program/ source language) to an equivalent program in another language (target program/ target language).
Interpreter -> directly execute the operations specified in the source program on inputs supplied by the user
What are the phases of a compiler?
- Lexical Analysis -> scans the source code as a stream of characters and converts it into meaningful lexemes. Lexical analyser represents these lexemes in the form of tokens as:
○- Syntax Analysis -> or parsing, takes the token produces by lexical analysis as input and generates a parse tree (or syntax tree).
○ Token arrangements are checked against the source code grammar, i.e. The parser checks if the expression made by the tokens is syntactically correct. - Semantic Analysis -> Checks whether the parse tree constructed follows the rules of language. E.g. Assignment of values is between compatible data types, and adding string to an integer.
○ Keeps track of identifiers, their types and expressions. The semantic analyser produces an annotated syntax tree as an output. - Intermediate Code Generation -> After semantic analysis the compiler generates an intermediate code of the source code for the target machine. E.g. Object Code
- Code Optimization -> optimization of the intermediate code. Removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources.
- Code generation -> Takes the optimized representation of the intermediate code and maps it to the target machine. Sequence of instructions of machine code performs the task as the intermediate code would do.
Symbol Table -> a data-structure maintained throughout all the phases of a compiler. All the identifier’s names along with their types are stored here.
- Syntax Analysis -> or parsing, takes the token produces by lexical analysis as input and generates a parse tree (or syntax tree).
What is the difference by passing an element by value or by reference?
- Pass by value -> do not change the value of the argument itself
Pass by reference -> update the value of the argument itself
Describe the terms left-recursive, nullable, useless.
- Left-recursive -> if starting with N, we can produce a string that starts again with N
○ N -> Nab- Nullable if starting with N, we can produce E (null)
○ N -> aNb | E (null) - Useless if starting with N, we can never produce a string consisting only of terminals.
N -> aN | bN
- Nullable if starting with N, we can produce E (null)
What is a calling graph?
- A directed graph with a node for each procedure
Edge from A to B, whenever A calls B (directly or indirectly)
What is an indirect calling graph?
- X calls Y, Y calls Z, Z calls W
- If A calls B and B calls C, then A calls C (indirectly)
○ This rule is called transitive orientation - For a program with n procedures, recursive transitive closure needs O(n^3) time
However, for relatively “sparse” programs, it is working well in O(n) time
- If A calls B and B calls C, then A calls C (indirectly)
What is a token (in lexical analysis)?
- A notation representing the kind of lexical unit
○ A keyword (for, if, then …)
○ An identifier- Consists of a pair of:
○ A token name - influences parsing decisions
An (optional) attribute value - which specifies the particular lexeme represented by the token. - influences the translation of the token after the parsing (in the semantic analysis)
- Consists of a pair of:
What are the two ways to handle reserved words?
- Install the reserve words initially in the symbol table
a. The corresponding entry in the symbol table stores information that this is a reserved word
b. When we scan a lexeme, a call to installID
i. Checks in the symbol table whether this lexeme is a reserved word
ii. It adds it in the symbol table (as an id) only if it is not already there
Create seperate transition diagrams for each reserved word:
a. We must then check that the scan word ends after reading then
i. Otherwise it might be an identifier called “thenextvalue”
b. For each new lexeme:
i. First run the transition diagrams for reserved words
If none accepts, the token can be recognized as id
What is Lex and describe how it works.
- Lists some patterns (regular expressions) with some order
- Scans the input
○ Until it finds the longest prefix of the input that matches one of the patterns
○ If it matches many patterns, it chooses the first one in the order - Returns the corresponding token to the parser
- Scans the input
Describe the components of a parse tree in Syntax Analysis.
- Each internal node ○ Is marked by a non-terminal ○ Represents the application of a production rule - Each leaf ○ Is marked by a terminal - All the leafs give the input string
What are the two main methods for constructing a parse tree:
- The top-down approach ○ Start from the root (labelled with the start symbol) ○ Continue down to the leafs - The bottom-up approach ○ Start from the leafs Continue up to the root
How can we resolve disambiguity?
• Impose rules defining the relative precedence of operators
○ When we have two different operators (e.g. + and *)
Operator * has a higher precedence than +
Describe the components of an abstract parse tree.
- The parse tree represents the steps of the derivation of a string
- Every internal node:
○ Is marked by a non-terminal
○ Represents the application of a production rule
Removes auxiliary non terminals from the parse tree
- Every internal node: