Lecture 8 - Compilation and optimization Flashcards

Question

What are the two types of IR optimization that can be done during compilation

Answer 1

Machine independent optimizations Machine dependent optimizations

Answer 2

Independent of target architecture dead code elimination: variables that are never used, basic blocks that are never reached. Can see that a variable is assigned, but for example never used again. This does not actually need to be executed, and can be omitted. Constant progagation: Identify variables that are constant. Where these are used, substitute variable with the value so that the variable. x = 3 y = x+7 -> y=3+7 Constant folding: If an instruction always adds 3 and 7, replace these with a constant of value 10. So it computes and substitute constant expressions. y =3+7 -> y=10

Answer 3

Specifically aim at a target architecture May not be valid for different architectures Instruction selection: If you want to multiply, do you use multiplication instruction or for example a sequence of addition instructions Register allocation depending on registers in the ISA

Answer 4

Goes over the CFG multiple times, doing different simple tasks, and convert it into new CFGs The semantics stays the same and the CFGs gets optimized.

Answer 5

Need to map the virtual registers that was used during IR to registers. If there are more variables than registers, map these to memory and load/store when needed. Translate each assignment to an instruction. Some ISAs may need more than one instruction to do this. ISA and CPU optimization: - reorder instructions Add label to each basic block so it can be reached. Define an order of which the basic blocks are layed out in memory. When a basic block ends, branch to the next one. You don't need to branch if a basic block always comes after another. In this case, just store the second basic block right after the first one, and omit the branch instruction from the first leading to the second. Remove superfluous branches (branches that are not used or are unecessary). For example branches that branch to basic blocks that will always execute after the current one. As said above, instead store these sequentially in memory.

Answer 6

Frontend: - Lexical analysis -> tokens - Syntactic analysis -> syntax tree - Semantic analysis -> type checked syntax tree - Generate IR -> IR Backend: - Optimize IR -> IR (Optimized) -> Generate ASM -> High quality assembly

Answer 7

As programs often spend lots of time in loops, these are optimized

Answer 8

Reduce loop overhead Increase opportunities for other optimizations Improve pipeline and memory system performance

Answer 9

Loop unrollig Loop fusion Loop distribution/fission Loop interchange Loop tiling

Answer 10

Duplicates loop body n times and adjust loop bounds. This reduces number of branches which are big performance bottlenecks in hardware due to flushing of pipeline. Enables more optimizations, but gives more register preasure. for(i = 0, i < 4; i++){ a[i] = b[i] } optimized: for(i = 0, i < 4; i+=2){ a[i] = b[i] a[i + 1] = b[i + 1] }

Answer 11

Combine two (or more) loops into one. Can be done as long as there are no data dependences. for(i=0; i < N, i++){ a[i] = b[i] } for(i=0; i < N, i++){ c[i] = d[i] } optimized: for(i=0; i < N, i++){ a[i] = b[i] c[i] = d[i] }

Answer 12

Pros: - May improve data locality - reduces loop overhead - May enable better instruction scheduling Cons: - May hurt data locality - May hurt I-cache hit rate

Answer 13

Divides a loop into two (or more) loops. Essentially the opposite of loop fusion. This has advantages if a loop for example has data dependences in one part of the loop. The instructions in this part cannot execute in parallell. If we extract the part of the loop that does not have dependences, this one can be executed in parallell. Or the second loop can get other optimizations (vector instruction...) Reduces register pressure, increases loop overhead.

Answer 14

Switches the order of loops in a loop nest If you for example have a 2D loop: for(i = 0, i < N, i++){ for(j = 0, j < M, j++){ c[j][i] = a[j][i]*5 } } optimized: for(j = 0, j < N, j++){ for(i = 0, i < M, i++){ c[j][i] = a[j][i]*5 } } This can improve data locality based on how the data is stored in cache. Can make it so cache lines are used more efficiently

Answer 15

Breaks a loop into a set of nested loops. Each inner loop operates on a subset of data. Can be done as long as there are no data dependences. for(i = 0, i < N, i++){ for(j = 0, j < M, j++){ f(i,j); } } optimized: for(i = 0, i < N, i++) for(j = 0, j < M, j++) for(ii = 0, ii < N, ii++) for(jj = 0, jj < M, jj++) f(ii, jj); This is also a method for optimizing memory use. Can improve data locality.

Answer 16

Calling very short procedures are very expensive, as we need to set up a whole context for the function, function call/return, stack frame, argument/result passing. Indirect costs: break intra-procedural analysis to inter-procedural analysis int foo(a, b, c){ return a + b + c; } foo(x, y, z) optimized: w = x + y + z pros: inlining removes these costs Cons: can increase code size, can reduce I-cache hits rate

Answer 17

Minimizes memory accesses.

Answer 18

Checks how long is a variable value actually needed.

Answer 19

Way of doing register allocation. Connect variables in a diagram representing dataflow (what variables depend on eachother). Assign each node a different color so that there are no two neighbours with the same color. Edges between nodes indicate that they are live at the same time. Registers are represented by the colors. Want to use as few colors as possible.

Answer 20

When instructions are shuffled around, the variable lifetimes changes, and therefor the number of registers needed might change.

Answer 21

IR code (represented as CDFG) needs to be translated into machine code. There are sometimes multiple ways IR instructions can be translated into machine instructions. Need to find the best template for expression so that it minimizes the chosen cost metric

Answer 22

Takes the generated assembly code and generates object code. Much simpler than the compiler. Generate a binary representation of the assembly instruction, often using a one-to-one translation. Translate labels into addresses. Handle pseudo-ops Two-pass approach: - First: generate symbol table - Second: Resolve labels (substitute labels with addresses) and generate machine instructions

Answer 23

The assembler won't find this label in the symbol table. The label is marked as external reference. This external reference is left for the linker to resolve.

Answer 24

Scan the file and collect labels and their addresses (addresses are generally relative to the first instructions in the file)

Answer 25

Output from assembler Several standards for these Includes: - Symbol table - Program code (.text segment) - Data (.data segment) - Information about relocatable parts - Debug data (references to source files)

Answer 26

Resolve all external references. Generate one executable from all the object files. - All object file segments (text, data) are combined - Determine start address for all modules - Combine all symbol tables - Resolve all symbols: - Transforms relative- to absolute addresses - Produces error if a label/symbol cannot be found in the merged symbol table

Lecture 8 - Compilation and optimization Flashcards

(50 cards)