Compilation/Parsing/Code Generation Flashcards
Compilation Steps
Lexical Analysis ->
Syntax Analysis ->
Semantic Analysis ->
Code Generation ->
Code Optimisation
At every steps of these, the entire code is analysed with the surrounding context.
Java Interpreter with Compilation Steps
Interpreters use the first three steps to find the next instruction, as they cannot do any optimisation because they do not look ahead at the whole code.
Java Compiler with Compilation Steps
The java compiler uses all five steps to produce optimised bytecode that is interpreted by java virtual machine.
Lexical Analysis / Scanning
The scanner uses whitespace and punctuation to identify sequences of characters.
Often implemented by finite state machines.
Turns code into a series of tokens.
After all of this, the compiler has a list of tokens that make up the program (identifiers, keywords, separators, operators, literals, comments).
Syntax Analysis / Parsing
The list of tokens are checked to see if it forms a valid program.
The parser knows the grammar rules of the language, and checks to see if the tokens match the grammar rules.
This generates an abstract syntax tree via implementing a recursive algorithm with backtracking.
The compiler stops and outputs a message if the grammar is incorrect at some stage.
Semantic Analysis
Identifiers are added to a symbol table and type checking is implemented.
Variable usage and functions calls are matched to their definitions (object binding).
The compiler will stop and output a message if semantic checks fail.
Semantic Analysis Fail Outputs
- type mismatch
- undeclared variables
- reserved keywords used as identifier
- multiple declarations of same identifier
- variable out of scope
Code Generation
Turns abstract tree into actual machine code instructions, adding instructions to reserve memory for each variable in the symbol table. Then walks through each node in the syntax tree and turns it into CPU instructions (the node by node order will generate code correctly).
Code Generation Action Examples
- Specify the correct instructions
- Select appropriate CPU registers to hold variables during calculations
- Create a subroutine for each higher order function in the syntax tree
- Add jump instructions to implement each condition and loop in the syntax tree
- Add call, push and pop instructions to set up subroutines with correct parameters
- Adding debugging information so programmer can step through code
Code Optimisation
The code is analysed to see if we can reduce the amount of code.
Code Optimisation Examples
Remove Redundancy -> store calculate results so they can be used again later.
Remove Code -> unnecessary calculations etc.
Unroll Loops -> generate same instructions in linear matter instead of looping.
Reverse Loops -> counter works in other direction etc.
Increase Locality -> make use of multiple CPU cores with instructions paired next to each other.
Use Parallelism -> make use of multiple CPU cores with instructions rearranged to be carried out concurrently.
Fold Constants -> replace calculations with their results
Loop Invariant Code -> removing loops that will not effect anything.
Strength Reduction -> removing things with loops and replacing them with one line equations.
Linking Phase and Symbol Resolution
The linking phrase is what we described beforehand; the linker performs its actions to link source files with object files into one executable, seeing the symbol table and looking for main.