Parser Flashcards
What does the parser do?
Group tokens into grammatical phrases
Discovers underlying structure of program
finds syntax errors
What is output of parser for legal program?
Abstract syntax tree (+ symbol table)
intermediate code
object code
terminal symbol
Tokens returned by scanner
productions
rules
Production LHS
Single non-terminal
Production RHS
Either epsilon or a sequence of one or more terminals and/or non-terminals
CFG 4-tuple (N, E, P, S)
N - set of nonterminals
E - set of terminals
P - set of productions
S - start non-terminals
Leftmost derivation
The leftmost nonterminal is always chosen to be replaced
Rightmost derivation
The rightmost nonterminal is always chosen to be replaced
How to construct a parse tree
Start with start non-terminal
Repeat: 1) choose a leaf nonterm X 2) choose a production X->alpha 3) The symbols in alpha become the children of X in the tree
The derived string is formed by reading leaf nodes from _______ to _______
The derived string is formed by reading the leaf nodes from left to right
Ambiguous grammer
> 1 leftmost or > 1 right-most derivation or > 1 parse tree
How to write a grammer to express precedence
Use a different nonterminal for each precendence leve
Start by writing a rule for the operator with the lowest precedence
exp->
What causes ambiguity?
Ill defined precedence and/or associativity
For left associativity, use
left recursion
For right associativity, use
right recursion
How to force left associativity?
exp->exp MINUS exp | term
exp->exp MINUS term | term
Syntax directed translation
defined by augmenting the CFS: a translation rule is defined for each production
A translation rule defines the translation of the LHS non-terminal as a function of:
constants
the RHS nonterminal’s translations
the RHS tokens’ values
To translate an input string using syntax directed translation:
- Build the parse tree
2. Use the translation rules to computer the translation of each non-terminal in the tree, working bottom up
Why work bottom up for syntax directed translation?
A non-terminal’s value may depend on the value of the symbols on the RHS so need to work bottom up so those values are available
AST vs. Parse Tree
AST: operators appear at internal nodes instead of leaves, chains of single productions are collapsed, listed a flattened, syntactic details are omitted
Context Free Grammar
A set of recursive rewriting rules to generate patterns of strings
CFG in compiler
Start with a string and end with a parse tree for w if w exits in L(G)
syntax directed translation
translate a sequence of tokens into a sequence of actions
For ASTS, when we execute an SDT rule:
we construct a new node object, which becomes the value of LHS.trans
populate the node’s fields with the translations of the RHS non-terminals
How to know if a string is in the language of the CFG?
Yes iff the string can be derived from the CFG’s start non-terminal
Yes iff we can build a parse tree for the string, with the CFG’s start nonterminal at the root
CYK algorithm used for what?
Used to parse any CFG (to determine for any input whether that input is in the language of a given CFG)
CYK essentially does what?
Builds all of the possible parse trees for all (consecutive) fragments of the input, bottom up.
When does CYK accept input?
Iff it is able to build a complete a complete parse tree with the start non-terminal at the root and the entire input at the leaves.
Chomsky Normal Form
X->singular terminal
X->2 non-terminals
X->epsilon is not allowed unless the nonterminal is the start non-terminal
If S->epsilon and S is the start non-term, then S cannot occur on the RHS of any rule
How to convert from CFG to CNF
- Eliminate epsilon rules
- Eliminate rules with exactly 1 non-terminal on the right (unit rules)
- Fix remaining rules so all rules have single terminal or exactly two non-terminals on the right
how to eliminate epsilon for CNF?
A->epsilon
Take any other rule that has A on the RHS and copy it with nothing
F->(A) F->()
Idea of predictive parser
build parse tree top down, keep track of work to be done, use a parse table to decide how to do the parse
scanned tokens + stack contents correspond to what in predictive parser?
Leaves of the current (incomplete) parse tree
Rows of parse table are indexed by what?
indexed by nonterminals
columns of parse table are indexed by what?
columns are indexed by the tokens
Each element of the parse table is what?
Each element of the table for the row indexed by nonterminal X is either empty or contains the RHS of a grammar rule for X
What are the first two things pushed onto the stack for a predictive parser?
1) EOF terminal
2) start nonterminal
What happens if top-of-stack symbol is a nonterminal X?
predictive parser
Use nonterminal X and the current token t to index into the parse table and choose a production with X on the LHS (RHS is in table[x][t]
pop x from stack and push the chosen production’s RHS
For nonterminal, which direction are chosen RHS pushed onto stack (predictive parser)
Push symbols from R to L
What happens if top-of-stack symbol is terminal (predictive parser)?
Match it with the current token; if it matches, pop it and call the scanner to get the next token
3 ways to terminate predictive parser?
Top of stack is non-terminal, and parse table entry is empty
top-of-stack=terminal but doesn’t match curr token
stack is empty
When is predictive parser input accepted?
Stack is empty?
When is predictive parser input rejected
nonterminal and no parse table entry
terminal and doesn’t match current token
Is it always possible to build a predictive parser given a CFG?
Only possible to build a predictive parser for CFG if CFG is LL(1)
Example of something not in LL(1) because LL(1) only allows 1 look ahead
S->(S)|()
How to know if grammar is LL(1)?
If build the parse table and no element of the table contains more than 1 grammar rule RHS, then LL(1)
2 properties that preclude grammar from being LL(1)
Left recursive grammar
Grammars that are not left factored
A nonterminal X is useless if:
1) you can’t derive a sequence that includes X
2) You can’t derive a string (epsilon or a sequence of terminals) from X
Transform left recursion into non-left recursion
A->Aa|B
A->BA’
A’->aA’|epsilon
Meaning of left factored
A nonterminal has 2 productions whose RHS have a common prefix
Can a LL(1) grammar be left factored?
No
A grammar is not LL(1) if it is:
Left recursive
Not left factored
How to check if grammar is in LL(1)
Build the parse table; if any element in the table contains more than 1 grammar rule RHS, the grammar is not LL(1)
Idea of first sets
for a sequence a, FIRST(a) is the set of terminals that begin the strings derivable from a
Can grammar be LL(1) if a is current token
A->alpha, B->beta and a is in FIRST(alpha) and FIRST(beta)
No
FOLLOW sets are defined for what?
single non-terminals
Define FIRST sets for what?
RHS of each production
define FIRST sets for arbitrary sequences of terminals/non-terms/epsilon
Can epsilon be in a follow set?
No, epsilon can never be a follow set
What does the semantic stack hold?
Nonterminal’s translations
When parse is finished, what does semantic stack hold?
Translation of the root nonterminal (translation of the whole input)
How are values pushed onto the semantic stack?
Add actions to grammar rules
Actions for a grammar rule must:
Pop the translations of all RHS nonterminals
compute and push the translation of the LHS nonterminal
Action numbers
part of RHS of grammar rules
Pushed onto normal stack, when action # is top of stack, popped and action carried out
When is action performed
not performed until all derivations of LHS are carried out
RHS nonterminals popped from semantic stack in which order?
Right to left because predictive parser does a leftmost derivation