final Flashcards

1
Q

Fixed-overhead encoding

A

[mask][instruction list]
Mask is 8 bits, where 1 represents active instruction and 0 represents a nop
[10100111][A][B][C][D][E] => [A][nop][B][nop][nop][C][D][E]
+simple, but (-) has higher overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Templated-based encoding

A

○ Have templates to represent common patterns
○ (+) Relatively low overhead
○ (+) no penalty for short instruction
○ (-) limited set of templates, so still need some nops
○ Example:
§ [temp0][A][B][C]
§ Temp0 = (0->0). (1->2). (2->5). Chain.
§ Expands to:
§ [A][nop][B][nop][nop][C]
§ “Map instruction 0 to location 0”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does VLIW handle branching?

A
  • Two-Step Branching -
    ○ Separate comparison from branch instruction (comparison results are stored in a branch-condition register)
    ○ Between compare and branch, can fill with other instructions
    ○ Branch instruction requires 2 delays slots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how does VLIW handle predication?

A
  • Partial predication:
    ○ Execute all individual operations on both sides of the branch
    ○ Select the final based on a condition
    ○ Slct R3 = B1, R4, R3
    If B1, then R3 = R4, otherwise R3 = R3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does VLIW handle speculation?

A

○ Control speculation: move instruction out of branch condition
Data speculation: check if LD/ST are to the same address after loading and storing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compiler front-end steps

A
  • Scanner - reads tokens
    - Parser - gets tokens from scanner and checks syntax (tokens are in right place)
    Semantic Analysis - does the sentence make sense? (“aaa” + 10)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

register allocation steps

A
  • Web: Determine live ranges for each value
    • Interference: Determine overlapping ranges => graph
    • Spill cost (how many definitions/uses exist in that range): Compute the benefit of keeping each web in a register (vs memory)
    • Allocation: Decide which webs get registers
    • Spilling and splitting: split webs if needed
    • Assignment: assign actual registers to webs
      Code generation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

webs

A

○ All reachable uses and definitions are placed in the same web (i.e. if a def/use case for that variable is not reachable by the web, it is placed in a different web)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

interference graph

A

○ Each node in the graph represents a variable/web
○ Each edge represents two variables that are live at the same time (i.e. A-B means that A and B are live at the same time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

graph coloring heuristic

A
  1. Remove nodes that have degree < N and push onto a stack
  2. For remaining nodes (degree >= N): repeatedly remove and push onto stack the node with the least spill cost
    ○ ==> result is a stack with the nodes ordered from highest to lowest spill cost to nodes with deg < N
    ○ This enables us to prioritize coloring nodes with the highest spill code first
  3. To color:
    • Pop a node off the top of the stack and color it differently than its neighbors that are already colored
    • If no color available, node is assigned to memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

choosing which web to split

A

○ Has interference degree >= N

○ Has minimal spill cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

register coalescing

A

if r0 and r1 do not interfere, combine their webs and assign them one register
(-) might create a web that is not colorable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

register targeting

A

○ Some variables need to be in specific registers at a certain time (args to function, return value, etc)
○ Pre-color those webs to eliminate copy instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Presplitting of webs

A

○ Some variables have large “dead” regions (i.e. regions between def and use where the variable is not used)
○ ==> Can presplit the web into two live ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interprocedural register allocation

A

○ Store registers across procedural boundaries is expensive
○ Can customize procedural call by cloning the function to save loading and storing a register
§ Each clone function is only available for that call site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is differential register allocation?

A

Instead of using instruction bits to id a register, use less bits to id the difference between the last and next regsister (i.e. r1, r2 => 1, r2, r5 => 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

adjacency graph

A

used in differential register allocation. captures the order in which registers are accessed, An edge from r1 -> r2 with weight 2, means that r2 is accessed after r1 2 times

18
Q

adjacency graph: Edge coverage

A

if an edge in the adjacency graph can be encoded with the given number of bits

19
Q

adjacency graph: cost

A

the sums of the weights of the uncovered edges

20
Q

differential remapping algorithm

A

○ Start with initial register allocation and compute cost
○ Iteratively swap a pair of registers and recompute the cost
§ Make sure to do all possible swappings
§ Pick the new remapping with the lowest cost
Repeat step 2 until there is no more cost reduction

21
Q

what is the Simple Offset Problem?

A
  • How to most efficiently assign variables on the stack using auto-increment/auto-decrement modes
    • Assumes one address register
22
Q

access graph

A

used for simple offset problem. Undirected, weighted graph. Each node is a variable. Each edge represents an access a–(3)–b (a and b are accessed after one another 3 times, direction a/b b/a doesnt matter)

23
Q

access graph: weight

A

sum of all the edges

24
Q

access graph: cover

A

all nodes are connected, but no cycles, each node can have at most 2 neighbors

25
Q

access graph: cover cost

A

weight - cost of cover

26
Q

General Offset Assignment Problem:

A
  • Generalization of the simple offset problem
    ○ Have multiple address registers
    Given N address registers, split the access graph into N covers
27
Q

Relax Fixed Evaluation order

A

SOA and GOA assume a fixed-evaluation order => can we transform the access sequence to improve our assignments by swapping commutative and associative ops?

28
Q

Least Cost Access Sequence (LCAS):

A

Must keep applying commutative and associative transforms on the access sequence until you get the least cost access sequence

29
Q

pruning

A

used for finding the least cost access sequence: trying all possible combos is exhausting:
○ Start with first statement, generate all possible swaps
○ Prune the ones with max cost
○ Add second statement to subset
○ Prune the ones with max cost
Repeat

30
Q

graph model for LD/ST parallelization

A
  • Each load/store is a node
    • Edge between nodes means they can be merged into a parallel LD/ST instruction
      ○ They can be merged if their operands are in different memories
    • Pick maximum number of edges that are disjoint (i.e. not sharing a node)
31
Q

Moveable Boundary Problem

A

How can we rearrange the program order to get more parallelized LD/STs?

32
Q

Pseudo-fixed boundary

A

○ Move stores as early as possible (assume other instructions are fixed): STORES CAN ONLY BE MOVED UP
Move loads as late as possible (assume other instructions are fixed): LOADS CAN ONLY BE MOVED DOWN

33
Q

what is variable duplication?

A

copy a variable from one bank to another to improve parallelization

34
Q

what is rematerialization?

A

recompute the value that you want duplicated and then store it into the other memory bank (instead of ld/st-ing)

35
Q

What is the Dual-Bank Register Assignment Problem?

A

both operands must come from different banks, but this can lead to problems with other instructions with the same constraints

36
Q

Dual-Bank Register Assignment is for what kind of instruction

A

ALU

37
Q

Register Conflict Graph

A

For bank conflicts
- Conflict edge -
○ Edge between two registers indicating that they must be in different registers (i.e. both registers appear as source registers in an ALU instruction)
- Edge must exist in interference graph

38
Q

Register conflict subgraph

A

Subgraph of the interference graph that only shows the conflict edges

39
Q

what is a Bipartite Graph?

A
  • Graph that can be divided into two sets such that:

○ There is no edge between nodes in a single set

40
Q

what is the No conflict law?

A

A RCG is conflictless (and thus can be allocated) if it is bipartite and A graph is bipartite if there are no odd cycles in RCG

41
Q

how to detect odd cycles

A
  • Breadth-First Hierarchy:
    ○ Node level: the shortest path length from a node to the root
    ○ Parallel edge - if two nodes have the same node level and there is an edge between them, then this indicates an odd cycle
    ○ Can detect cycles of length 2k+1, where k is the node level
42
Q

Live Range Splitting

A

method to split odd cycles: Split a live range into multiple variables to remove an edge from the graph