Lecture 8 - Compilation and optimization Flashcards

1
Q

What are compilation units?

A

When writing a program, the code is often divided into mulitple c-files, or compilation units.

Each compilation unit is compiled seperately.
The output of the compilation is an object-file containing relocatable object code.

Relocatable means that addresses for variables/branch targets are not fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What component combines the different object codes from the different compilation units?

A

A linker combines these into an executable, resovling references between units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When using shared library, what tool is responsible for making these libraries accessible within the program image?

A

A loader sets up the executable program in memory and initialises data areas, prior to the program being run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between a function declaration and a function definition?

A

Declaration: Informs compiler of the existence of a var/func
void swap(int *a, int *b);

Definition: Provides function body, allocate memory for local vars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In a header fine, what does the keyword extern do when used in declaring a variable:

extern int MAX_SIZE;

A

Tells the compiler the variable is defined somewhere else. Storage space is not allocated for the variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it important to include header files with function declarations in files where they are used, even if they are defined in other files?

A

Because then the compiler can read the function specifics (input types, return types) and throw errors if the function is for example provided wrong input on use (type checking).
If the header file was not included, the compiler would not see this error and just compile the code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some requirements for good compilers?

A

Produce meaningful errors on incorrect programs

Produce fast and optimized code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the compilation flow?

A

Split code into compilation units (multiple C- and assembly files)

C/Assembly files are assembled/compiled to object code files.

The linker ties up the dependences between object code files and generates an executable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the detailed compilation flow of a c-file?

A

Preprocessor
Compiler
assembler

The c-file is preprosessed by a preprocessor. The preprosessor takes care of generate a textual copy of the header file, and then generating a new c-file containing these.

This new c-file is then compiled by our compiler, which generates an assembler file (human readable representation of machine instructions). This step doesn’t always occur. Sometimes an object file is created directly.

This assembler file is then assembled by an assembler, which then generated the object file containing the binary machine instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a pre-processor do?

A

Takes care of #includes: imports header files. Includes textual copies of header files.

Micro-processing: for example text substitution (macros: #define NAME value)

Conditional compilation: If you want multiple version of the compilation (with/without debug messages)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between:
#include <header.h>
#include "header.h"</header.h>

A

The double quotes (“”) tells the compiler that the header file is a part of the project code, and can be found locally within the project.

The bracets (<>) tells the compiler that the headerfile is not a part of the local project we are trying to compile, but maybe a part of the system provided files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the compiler do in the compile flow?

A

Consist of frontend and backend

Frontend:
- Analyses source code for correctness
- Break it into basic elements
- reports errors
- If there are no errors, generate an intermediate representation (IR) for the backend to use

Backend:
- Optimize IR
- translate IR to ASM (machine code)
- Optimize ASM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens during the Lexical analysis?

A

Source code is split into elements that belong together.

As code is just a stream of characters, the compiler needs to figure out what characters belong together and what is their meaning.

Lexical analysis returns a list of tokens and the type of that token.

(“int”, KEYWORD)
(“=”, OPERATOR)
(“y”, IDENTIFIER)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the frontend stages

A

Lexical analysis (Scanning)
Syntactic analysis (parsing)
Semantic analysis (mainly type checking)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens during the syntactic analysis?

A

Take the tokens generated by the lexical analysis and parse these into a syntax tree based on the grammar we provide from the language.

Syntactic analysis checks that the structure of the code, the tokens, actually conform to the grammar of the language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens during semantic analysis?

A

The tokens might be syntactically correct, but semantically (meaning) wrong.

Example: int a = “banana”

This is a syntactically correct statement, but you cannot declare chars as an int

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Intermediate representation (IR) of the program?

A

Internal representation of the program.

Language- and machine code-independent.

On a level that makes it easier to optimize.

The internal language interfacing the frontend and backend of the compiler.

A language that is used to express syntacs and semantics of a program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is the IR necessary?

A

When compiling source code directly to assembly, Assembly is more difficult to optimize.

Assembly does not have enough information to optimize well. F.example no type information.

Enables modularity and reuse: as there are different frontend languages (C, C++, Java) and different processor architectures (ARM, RISCV), without a common IR each of these combinations would require their own compiler.

With the IR, you only need to write a frontend that parses each language, and a backend that generates the correct assembly code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the IR want to represent?

A

Want to represent how the data and control propagates through the program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Data Flow Graph

A

Represent how data flows within a “basic block”.

Does not represent control

Describes minimal ordering requirements on operations

Static single assignment is used to ease optimization

DFG consist of operations (+, -, *) that are used as nodes. Data (I/O variables: a, b, c, …) are drawn as edges between the nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a basic block?

A

Uninterrupted sequence of machine instruction

One exit, one entry

Cannot have branches

21
Q

What problem does static single assignment solve?

A

For example if a variable name is used multiple times (x), it can be difficult to see what instructions actually use the same variable x, and what instructions are out of scope for each definition of x.

SSA only keeps the first assignment of the variable name, and renames the other assignments (x = 1, x_1 = 2, x_3 = 3)

22
Q

What are partial orders used for?

A

Used to determine what operations can be executed in parallell. This is useful for super scalar processors.

Defines what operations are dependent on each other - and therefore cannot be executed in parallell.

Program instructions:
x=a+b
y=c-d
z=x*y
y1=b+d

Partial order:
1) a+b, c-d
2) x*y, b+d

23
Q

What is a Control-Data Flow Graph (CDFG)

A

Represent control and data flow in a program.

Nodes: Basic blocks
Edges: Branches between basic blocks

24
Q

What are the two types of IR optimization that can be done during compilation

A

Machine independent optimizations

Machine dependent optimizations

25
Q

What are machine independent optimizations? (3)

A

Independent of target architecture

dead code elimination: variables that are never used, basic blocks that are never reached.
Can see that a variable is assigned, but for example never used again. This does not actually need to be executed, and can be omitted.

Constant progagation: Identify variables that are constant. Where these are used, substitute variable with the value so that the variable.
x = 3
y = x+7 -> y=3+7

Constant folding: If an instruction always adds 3 and 7, replace these with a constant of value 10. So it computes and substitute constant expressions.
y =3+7 -> y=10

26
Q

What is machine dependent optimization?

A

Specifically aim at a target architecture

May not be valid for different architectures

Instruction selection: If you want to multiply, do you use multiplication instruction or for example a sequence of addition instructions

Register allocation depending on registers in the ISA

27
Q

How does IR optimization work?

A

Goes over the CFG multiple times, doing different simple tasks, and convert it into new CFGs

The semantics stays the same and the CFGs gets optimized.

28
Q

What happens during the code generation step of compilation?

A

Need to map the virtual registers that was used during IR to registers.

If there are more variables than registers, map these to memory and load/store when needed.

Translate each assignment to an instruction. Some ISAs may need more than one instruction to do this.

ISA and CPU optimization:
- reorder instructions

Add label to each basic block so it can be reached. Define an order of which the basic blocks are layed out in memory. When a basic block ends, branch to the next one. You don’t need to branch if a basic block always comes after another. In this case, just store the second basic block right after the first one, and omit the branch instruction from the first leading to the second.

Remove superfluous branches (branches that are not used or are unecessary). For example branches that branch to basic blocks that will always execute after the current one. As said above, instead store these sequentially in memory.

29
Q

Summarize the compiler flow with the key steps of the frontend and backend stage

A

Frontend:
- Lexical analysis -> tokens
- Syntactic analysis -> syntax tree
- Semantic analysis -> type checked syntax tree
- Generate IR -> IR

Backend:
- Optimize IR -> IR (Optimized)
-> Generate ASM -> High quality assembly

30
Q

What part of programs does compiler often optimize?

A

As programs often spend lots of time in loops, these are optimized

31
Q

Why does compilers optimize loops?

A

Reduce loop overhead

Increase opportunities for other optimizations

Improve pipeline and memory system performance

32
Q

What are some loop optimizations?(5)

A

Loop unrollig
Loop fusion
Loop distribution/fission
Loop interchange
Loop tiling

33
Q

What is loop unrolling?

A

Duplicates loop body n times and adjust loop bounds. This reduces number of branches which are big performance bottlenecks in hardware due to flushing of pipeline. Enables more optimizations, but gives more register preasure.

for(i = 0, i < 4; i++){
a[i] = b[i]
}

optimized:

for(i = 0, i < 4; i+=2){
a[i] = b[i]
a[i + 1] = b[i + 1]
}

34
Q

What is loop fusion?

A

Combine two (or more) loops into one. Can be done as long as there are no data dependences.

for(i=0; i < N, i++){
a[i] = b[i]
}

for(i=0; i < N, i++){
c[i] = d[i]
}

optimized:

for(i=0; i < N, i++){
a[i] = b[i]
c[i] = d[i]
}

35
Q

What are some pros and cons of loop fusion?

A

Pros:
- May improve data locality
- reduces loop overhead
- May enable better instruction scheduling

Cons:
- May hurt data locality
- May hurt I-cache hit rate

36
Q

What is loop distirbution/fission

A

Divides a loop into two (or more) loops. Essentially the opposite of loop fusion.
This has advantages if a loop for example has data dependences in one part of the loop. The instructions in this part cannot execute in parallell. If we extract the part of the loop that does not have dependences, this one can be executed in parallell. Or the second loop can get other optimizations (vector instruction…)

Reduces register pressure,
increases loop overhead.

37
Q

What is loop interchange?

A

Switches the order of loops in a loop nest

If you for example have a 2D loop:

for(i = 0, i < N, i++){
for(j = 0, j < M, j++){
c[j][i] = a[j][i]*5
}
}

optimized:

for(j = 0, j < N, j++){
for(i = 0, i < M, i++){
c[j][i] = a[j][i]*5
}
}

This can improve data locality based on how the data is stored in cache. Can make it so cache lines are used more efficiently

38
Q

What is loop tiling?

A

Breaks a loop into a set of nested loops. Each inner loop operates on a subset of data. Can be done as long as there are no data dependences.

for(i = 0, i < N, i++){
for(j = 0, j < M, j++){
f(i,j);
}
}

optimized:

for(i = 0, i < N, i++)
for(j = 0, j < M, j++)
for(ii = 0, ii < N, ii++)
for(jj = 0, jj < M, jj++)
f(ii, jj);

This is also a method for optimizing memory use. Can improve data locality.

39
Q

What does procedure/function inlining resolve?

A

Calling very short procedures are very expensive, as we need to set up a whole context for the function, function call/return, stack frame, argument/result passing.

Indirect costs: break intra-procedural analysis to inter-procedural analysis

int foo(a, b, c){
return a + b + c;
}

foo(x, y, z)

optimized:

w = x + y + z

pros: inlining removes these costs
Cons: can increase code size, can reduce I-cache hits rate

40
Q

Why is good register allocation important?

A

Minimizes memory accesses.

41
Q

What is register lifetime analysis?

A

Checks how long is a variable value actually needed.

42
Q

What is graph coloring?

A

Way of doing register allocation.

Connect variables in a diagram representing dataflow (what variables depend on eachother).

Assign each node a different color so that there are no two neighbours with the same color.

Edges between nodes indicate that they are live at the same time.

Registers are represented by the colors. Want to use as few colors as possible.

43
Q

How does instruction scheduling affect register allocation?

A

When instructions are shuffled around, the variable lifetimes changes, and therefor the number of registers needed might change.

44
Q

What happens during the instruction selection phase in the backend?

A

IR code (represented as CDFG) needs to be translated into machine code. There are sometimes multiple ways IR instructions can be translated into machine instructions.

Need to find the best template for expression so that it minimizes the chosen cost metric

45
Q

What does the assembler do in the compilation flow?

A

Takes the generated assembly code and generates object code.

Much simpler than the compiler.

Generate a binary representation of the assembly instruction, often using a one-to-one translation.

Translate labels into addresses.

Handle pseudo-ops

Two-pass approach:
- First: generate symbol table
- Second: Resolve labels (substitute labels with addresses) and generate machine instructions

46
Q

What happens if one file branches to a label defined in another file?

A

The assembler won’t find this label in the symbol table.

The label is marked as external reference. This external reference is left for the linker to resolve.

46
Q

How does the assembler generate the symbol table?

A

Scan the file and collect labels and their addresses

(addresses are generally relative to the first instructions in the file)

47
Q

What is the object file?

A

Output from assembler

Several standards for these

Includes:
- Symbol table
- Program code (.text segment)
- Data (.data segment)
- Information about relocatable parts
- Debug data (references to source files)

48
Q

What tasks does the linker do?

A

Resolve all external references.

Generate one executable from all the object files.
- All object file segments (text, data) are combined
- Determine start address for all modules
- Combine all symbol tables
- Resolve all symbols:
- Transforms relative- to absolute addresses
- Produces error if a label/symbol cannot be found in the merged symbol table