Lecture 8 - Compilation and optimization Flashcards
What are compilation units?
When writing a program, the code is often divided into mulitple c-files, or compilation units.
Each compilation unit is compiled seperately.
The output of the compilation is an object-file containing relocatable object code.
Relocatable means that addresses for variables/branch targets are not fixed
What component combines the different object codes from the different compilation units?
A linker combines these into an executable, resovling references between units.
When using shared library, what tool is responsible for making these libraries accessible within the program image?
A loader sets up the executable program in memory and initialises data areas, prior to the program being run.
What is the difference between a function declaration and a function definition?
Declaration: Informs compiler of the existence of a var/func
void swap(int *a, int *b);
Definition: Provides function body, allocate memory for local vars
In a header fine, what does the keyword extern do when used in declaring a variable:
extern int MAX_SIZE;
Tells the compiler the variable is defined somewhere else. Storage space is not allocated for the variable.
Why is it important to include header files with function declarations in files where they are used, even if they are defined in other files?
Because then the compiler can read the function specifics (input types, return types) and throw errors if the function is for example provided wrong input on use (type checking).
If the header file was not included, the compiler would not see this error and just compile the code.
What are some requirements for good compilers?
Produce meaningful errors on incorrect programs
Produce fast and optimized code
What are the compilation flow?
Split code into compilation units (multiple C- and assembly files)
C/Assembly files are assembled/compiled to object code files.
The linker ties up the dependences between object code files and generates an executable.
What is the detailed compilation flow of a c-file?
Preprocessor
Compiler
assembler
The c-file is preprosessed by a preprocessor. The preprosessor takes care of generate a textual copy of the header file, and then generating a new c-file containing these.
This new c-file is then compiled by our compiler, which generates an assembler file (human readable representation of machine instructions). This step doesn’t always occur. Sometimes an object file is created directly.
This assembler file is then assembled by an assembler, which then generated the object file containing the binary machine instructions.
What does a pre-processor do?
Takes care of #includes: imports header files. Includes textual copies of header files.
Micro-processing: for example text substitution (macros: #define NAME value)
Conditional compilation: If you want multiple version of the compilation (with/without debug messages)
What is the difference between:
#include <header.h>
#include "header.h"</header.h>
The double quotes (“”) tells the compiler that the header file is a part of the project code, and can be found locally within the project.
The bracets (<>) tells the compiler that the headerfile is not a part of the local project we are trying to compile, but maybe a part of the system provided files
What does the compiler do in the compile flow?
Consist of frontend and backend
Frontend:
- Analyses source code for correctness
- Break it into basic elements
- reports errors
- If there are no errors, generate an intermediate representation (IR) for the backend to use
Backend:
- Optimize IR
- translate IR to ASM (machine code)
- Optimize ASM
What happens during the Lexical analysis?
Source code is split into elements that belong together.
As code is just a stream of characters, the compiler needs to figure out what characters belong together and what is their meaning.
Lexical analysis returns a list of tokens and the type of that token.
(“int”, KEYWORD)
(“=”, OPERATOR)
(“y”, IDENTIFIER)
…
What are the frontend stages
Lexical analysis (Scanning)
Syntactic analysis (parsing)
Semantic analysis (mainly type checking)
What happens during the syntactic analysis?
Take the tokens generated by the lexical analysis and parse these into a syntax tree based on the grammar we provide from the language.
Syntactic analysis checks that the structure of the code, the tokens, actually conform to the grammar of the language.
What happens during semantic analysis?
The tokens might be syntactically correct, but semantically (meaning) wrong.
Example: int a = “banana”
This is a syntactically correct statement, but you cannot declare chars as an int
What is the Intermediate representation (IR) of the program?
Internal representation of the program.
Language- and machine code-independent.
On a level that makes it easier to optimize.
The internal language interfacing the frontend and backend of the compiler.
A language that is used to express syntacs and semantics of a program.
Why is the IR necessary?
When compiling source code directly to assembly, Assembly is more difficult to optimize.
Assembly does not have enough information to optimize well. F.example no type information.
Enables modularity and reuse: as there are different frontend languages (C, C++, Java) and different processor architectures (ARM, RISCV), without a common IR each of these combinations would require their own compiler.
With the IR, you only need to write a frontend that parses each language, and a backend that generates the correct assembly code.
What does the IR want to represent?
Want to represent how the data and control propagates through the program.
What is a Data Flow Graph
Represent how data flows within a “basic block”.
Does not represent control
Describes minimal ordering requirements on operations
Static single assignment is used to ease optimization
DFG consist of operations (+, -, *) that are used as nodes. Data (I/O variables: a, b, c, …) are drawn as edges between the nodes