1.2.2.e Stages of Compilation Flashcards
What does source code need to be?
Close to the English language
Easy to read, modify, and interpret.
What does a processor need to convert source code to, and what is this process called?
The machine code needs to be converted into machine code, by a translator. This is called compilation, and is carried out by a compiler.
What are the 4 stages of compilation?
Stage 1 - Lexical Analysis
Stage 2 - Syntax Analysis
Stage 3 - Code Generation
Stage 4 - Code Optimisation
Summarise the lexical analysis stage of compilation.
Comments and white spaces are removed.
Remaining code turned into tokens.
Symbol table is created to keep track of variables and subroutines.
Summarise the syntax analysis stage of compilation.
Parse tree is built from tokens.
Errors are generated is any tokens break syntax of the language.
Summarise the code generation stage of compilation.
Parse tree converted to object code, which is the machine code before the linker is run.
Summarise the code optimisation stage of compilation.
Code is tweaked so it will be time and memory efficient.
Describe the stages of lexical analysis.
1) The lexer starts by converting lexemes in the source code into a series of tokens.
2) When the lexer encounters a whitespace, operator symbol, or special symbol, it decides that a word (lexeme) is complete.
3) The lexer checks if each lexeme is valid using its predefined set of rules. This allows every lexeme to be identified as a valid token.
4) Token streams are created from the token class (from the predefined rules) and the lexeme, and these are inputted into a symbol table.
What would be considered to be a token class within the predefined set of rules?
Keywords, constants, identifiers, strings, numbers, operators, punctuation, Quotes, Boolean, Datatypes, Identifiers are all examples of token classes.
Give an example of what a token would look like. (The format).
[Token class: lexeme/token]
Many of these tokens create a token stream.
What are the side effects, or the consequences, or lexical analysis?
White spaces and comments are removed from source code.
What would a symbol table look like?
[Index] [Token] [Token Class] [Datatype]
What is the advantage of multiple symbol tables being created?
It allows variables to have the same name in a program, but allows each variable to have a different scope within the program.
What else could be created in addition to the symbol table, and what is the advantage of this?
A strings table could be created.
It makes later stages of compilation more efficient.
Why is the symbol table usually stored as a hash table?
Using a hash table means that hashing can be used as the indexing. This allows for efficient lookup during the syntax analysis stage.