CFG
Context free grammar is a formal grammar which is used to generate all possible strings in given formal language
- CFG is defined by four tuples as:
**G= (V, T, P, S) **
where,
G- Grammar
T- finite set of Terminal symbols
V- Finite set of non terminal symbols
P- Set of production rules
S-start symbol
-Is used to derive the string
- we can derivatives string by repeatedly replacing a nonterminal by the right hand side of the production until all nonterminals have been replaced by terminal symbols
Capabilities of CFG
Derivation
Hierarchical representation of terminals and nonterminals
where,
Root of the parser- start symbol.
Leaves of the parse tree - terminals
Interior notes- non terminals
We have two options to decide which non terminal to be replaced with the production rule:
1. Leftmost derivation
2. Rightmost derivation
Leftmost derivation
Input is scanned and replaced with the production rule from left to right so we read the input string from left to right
eg:
production rules:
S = S + S
S = S - S
S = a | b |c
i/p:
a - b + c
The leftmost derivation is:
S = S + S
S = S - S + S
S = a - S + S
S = a - b + S
S = a - b + c
—- In the above example we can observe that Replacement according to the production rule is done from the left hand side
Rightmost derivation
1.** Input is scanned and replaced with the production rule **from right to left
2. So we read the input string from left to right
Example:
S = S + S
S = S - S
S = a | b |c
Input:
a - b + c
The right-most derivation is:
S = S - S
S = S - S + S
S = S - S + c
S = S - b + c
S = a - b + c
Parse tree
Parse tree follows three points:
1. Leafnotes–terminal
2. interior – Non - terminals
3. In-order traversal gives original input string
Ambiguity
eg: S = aSb | SS
S = ∈
If the grammar has ambiguity then it is not good for a compiler construction. No method can automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the whole grammar without ambiguity.
https://www.javatpoint.com/ambiguity
parser
Parsing is a two types:
1. Top-down parsing
2. Bottom-up parsing
Top - down parsing
https://www.javatpoint.com/parser
There Are two types of top down parsing:
1. Backtracking
2. Predictive parsers
A. Recursive decent
B. LL1 parser
Back tracking
Top down parser starts from the the start symbol and match the input string against production rules to replace them if matched
– It means one derivation of production fails the syntax analyzer other production role
eg: Mam’s document
Recursive decent parsing
E’()
{
if (lookahead == ‘+’)
{
match(‘+’);
if (lookahead == ‗i’)
match(‘i’);
E’();
}
}
match(char t)
{
if (lookhead== t)
{
lookahead = getchar();
}
else
printf(“Error”);
}
LL(1)
Diagram in the document
1. Non recursive top down parser
2. In LL(1),
1st L– Scanning of i/p from left to right
2nd L– Parsing technique that is leftmost derivation tree
1– we will read 1 symbol at a time
3. Predictive parser contains :
input
stack table
output
4. Input contains:
string to be passed
followed by $
Bottom of the stack marker
**5. Stack contains: **
Sequence of grammar symbols
Preceded by $
Bottom of the stack marker
— Stack holds leftmost derivation
5. Passing table is a **two dimensional array
M[A,a], **
Where,
A– Non terminal and
a– Terminal or symbol $
The parser is controlled by a programme that behaves as follows:
➢The program determines X, the symbol on top of the stack, and ‗a„the current input symbol.
➢These two symbols determine the action of the parser
There are three possibilities:
1. If X = a = $, the parser halts and announces
successful completion of parsing.
2. If X = a ≠ $, the parser pops X off the stack
and advances the input pointer to the next
input symbol.
3. If X is a nonterminal, the program consults
entry M[X, a] of the parsing table M.
This entry will be either an X-production of the grammar or an error entry.
➢If M[X, a] = {X → UVW}, the parser replaces X on top of the stack
by WVU (with U on top).
➢If M[X, a] = error, the parser calls an error recovery routine.
First & Follow
(Functions of LL1)
FIRST:
Step for finding FIRST:
1. If X is terminal, then FIRST(X) is {X}
2. If X → ∈is a production , then add ∈to FIRST(X).
3. If X is a non-terminal and X →Y1Y2……….Yk
is a production, then place ‗a‘ in
FIRST(X) if for some i,‘a‘ i.s in FIRST(Yi
) and ∈is in all of FIRST(Y1
)….FIRST(Yi-1
) .
If ∈is in FIRST(Yj
) for all j=1,2,……,k then add ∈to FIRST(X)
** From every production rule Find the first letter and if epsilum comes at the last We need to continue to the next line
eg: https://www.youtube.com/watch?v=UXYqQ_CJsVE**
** FOLLOW:**
Step for finding FOLLOW:
1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal and $ is the input right end
marker.
2) if there is a production A → αBβ ,then everything in FIRST(β) except for ∈is placed in
FOLLOW(B).
3) if there is a production A → αB or a production A → αBβ where FIRST(β ) contains ∈
then everything in FOLLOW(A) is in FOLLOW(B).
eg:
In the document
https://www.youtube.com/watch?v=WUwG50xJ9vc&t=0s
Parsing table construction
Didnt study
in the document
Bottom - up parsing
Shift reduce parsing
(Bottom up parsing)
There are two main categories of shift reduce parsing:
1. Operator-Precedence Parsing
2. LR-Parser
LR parsing
Diagram in https://www.javatpoint.com/lr-parser
1. used to parse the large class of grammars
2. IN LR,
L– Scanning left to right
R– rightmost derivation in reverse
K– number of input symbols of the look ahead used to make number of parsing decision.
There are four types of LR Parsing:
1. LR (0) parsing,
2. SLR(1) parsing,
3. CLR(1) parsing
4. LALR(1) parsing.
LR algorithm:
1. The LR algorithm requires stack, input, output and parsing table.
2. In all type of LR parsing, input, output and stack are same but parsing table is different.
input buffer
stacks
LR parser
LR parsing table - Input buffer is used to indicate end of the input and contains a string to be parsed followed by dollar symbol($string) - Stack is used to contain a sequence of grammar symbols which with dollar at the bottom of the stack - Parsing payable is a two Dimensional array contains two parts: action part go to partLR1 Parsing
Steps involved:
1. Forgiven input string write a cfg
2. check ambiguity of the grammar
3. Add Augment production in given grammar
4. Create canonical collection of LR0 items
5. Draw A data flow diagram dfa
6. Construct a LR 1 passing tree
Augment grammar
Argument grammar G Will be generated if we add one more production in the grammar given G
– It will help the parser to identify when to stop the parsing and announce acceptance of input
Example
Given grammar
S → AA
A → aA | b
The Augment grammar G is represented by
S → S
S → AA
A → aA | b
Canonical Collection of LR(0) items
SLR (1) Parsing
YACC
Basic operational sequence of yacc:
1. gram.y– This file contains desired grammar in YACC format
2. YACC - - it shows the YACC programme
3. y.tab.c—- It is a source programme created by yacc
4. cc or gcc—C compiler
5. a. Out - - - executable file that will parse grammar given in gram.y