final Flashcards
Fixed-overhead encoding
[mask][instruction list]
Mask is 8 bits, where 1 represents active instruction and 0 represents a nop
[10100111][A][B][C][D][E] => [A][nop][B][nop][nop][C][D][E]
+simple, but (-) has higher overhead
Templated-based encoding
○ Have templates to represent common patterns
○ (+) Relatively low overhead
○ (+) no penalty for short instruction
○ (-) limited set of templates, so still need some nops
○ Example:
§ [temp0][A][B][C]
§ Temp0 = (0->0). (1->2). (2->5). Chain.
§ Expands to:
§ [A][nop][B][nop][nop][C]
§ “Map instruction 0 to location 0”
How does VLIW handle branching?
- Two-Step Branching -
○ Separate comparison from branch instruction (comparison results are stored in a branch-condition register)
○ Between compare and branch, can fill with other instructions
○ Branch instruction requires 2 delays slots
how does VLIW handle predication?
- Partial predication:
○ Execute all individual operations on both sides of the branch
○ Select the final based on a condition
○ Slct R3 = B1, R4, R3
If B1, then R3 = R4, otherwise R3 = R3
How does VLIW handle speculation?
○ Control speculation: move instruction out of branch condition
Data speculation: check if LD/ST are to the same address after loading and storing
Compiler front-end steps
- Scanner - reads tokens
- Parser - gets tokens from scanner and checks syntax (tokens are in right place)
Semantic Analysis - does the sentence make sense? (“aaa” + 10)
register allocation steps
- Web: Determine live ranges for each value
- Interference: Determine overlapping ranges => graph
- Spill cost (how many definitions/uses exist in that range): Compute the benefit of keeping each web in a register (vs memory)
- Allocation: Decide which webs get registers
- Spilling and splitting: split webs if needed
- Assignment: assign actual registers to webs
Code generation
webs
○ All reachable uses and definitions are placed in the same web (i.e. if a def/use case for that variable is not reachable by the web, it is placed in a different web)
interference graph
○ Each node in the graph represents a variable/web
○ Each edge represents two variables that are live at the same time (i.e. A-B means that A and B are live at the same time)
graph coloring heuristic
- Remove nodes that have degree < N and push onto a stack
- For remaining nodes (degree >= N): repeatedly remove and push onto stack the node with the least spill cost
○ ==> result is a stack with the nodes ordered from highest to lowest spill cost to nodes with deg < N
○ This enables us to prioritize coloring nodes with the highest spill code first - To color:
- Pop a node off the top of the stack and color it differently than its neighbors that are already colored
- If no color available, node is assigned to memory
choosing which web to split
○ Has interference degree >= N
○ Has minimal spill cost
register coalescing
if r0 and r1 do not interfere, combine their webs and assign them one register
(-) might create a web that is not colorable
register targeting
○ Some variables need to be in specific registers at a certain time (args to function, return value, etc)
○ Pre-color those webs to eliminate copy instructions
Presplitting of webs
○ Some variables have large “dead” regions (i.e. regions between def and use where the variable is not used)
○ ==> Can presplit the web into two live ranges
Interprocedural register allocation
○ Store registers across procedural boundaries is expensive
○ Can customize procedural call by cloning the function to save loading and storing a register
§ Each clone function is only available for that call site
what is differential register allocation?
Instead of using instruction bits to id a register, use less bits to id the difference between the last and next regsister (i.e. r1, r2 => 1, r2, r5 => 3)