VLIW Theory Flashcards

1
Q

What are the behaviors that the compiler cannot predict?

A

The cash and the branch behaviors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can we have in the bundle more than one branch instruction?

A

No, we can’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the requirements of the fetching stage on a multiple issue processor

A

Higher bandwidth from the instruction cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How the decision of the instructions to be issued is taken any multiple issue processor

A

Either by dynamic scheduling or static scheduling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the bundles in a VLIW architecture?

A

They are a single issue packet that represents a wide instruction with multiple independent operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the impact of the fact that in a VLIW architecture there is a shared multi ported register file

A

The implies in the necessity of having enough ports to read and write the multiple operations issued by each very long single instruction in parallel

Therefore, in the case of a 4 slot bundle we need eight read ports to read eight source registers and four write ports to write four destination register per cycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens in case there is not enough parallelism in the source code to fill in the available operation slots

A

NOPs are inserted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the impact of the number of functional unities compared to the number of slots in the bundle, In the architecture

A

If each slot is assigned to functional unit the decode unit is a simple decoder otherwise, if there are more parallel functional units than the number of issues, the architecture must have a dispatch network to redirect each operation and the related source operands to the target FU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the VLIW processor solves RAW Hazard?

A

The compiler during the scheduling phase, reorders statically instructions (not involved in the dependency) otherwise the compiler introduces some NOPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What information must be exposed to the compiler in order to obtain correct instruction execution?

A

Operation latencies and data dependencies must be exposed to the compiler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are the WAR and WAW hazard solved in the VLIW processors

A

They are solved statically by the compiler by correctly selecting temporary slots for the operations or by using register renaming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are Miss predicted branches solved in the VLIW architecture

A

A miss predicted branch must be solved dynamically by the hardware by flushing the execution of the speculative instruction in the pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What needs to be done and what is the effect of avoiding structural hazards accessing the RF and WAR/WAW Hazard

A

It is necessary to keep in order execution, the write back phase of the parallel operations in a bundle must occur at the same clock cycle

This Implies that each bundle is constrained to the latency of the longer latency operation in it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the implication of doing out of order execution?

A

These way, we need to check in the write back phase for RF write accesses (avoid structural hazards) and WAR/WAW Hazzard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is and what causes register pressure

A

Register pressure is related to the occupation of the registers

It is caused by the multicycle latency of the operations together with the multiple issue of instructions that generates an increment of the number of register occupied overtime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the events that the compiler at static time does not know about

A

Data cache misses and branch miss predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are basic blocks?

A

Straight line code sequence with no branches in/out except at the entry/exit point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the relationship between basic blocks, data dependencies and the exploitable ILP in a program?

A

Beyond the fact that the amount of parallelism available within a basic block is small, data dependencies can further limit to the amount of ILP. We can exploit within a basic block.

Therefore, to obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the main advantages of the LIW architecture

A

Good performance through extensive compiler optimizations (the compiler can analyze the code on a wider instruction window and the compiler has more time to analyze dependencies and extract parallelism)

Reduced hardware complexity

20
Q

What are the challenges of the LIW architectures

A

Need for strong compiler technology

Code size increase

Huge number of registers needed for registering renaming

21
Q

lw $2, BASEA($4)
addi $2,$2,INC1
lw $3, BASEB($4)
addi $3,$3,INC2
addi $4, $4, 4
bne $4,$7, L1

MIPS 5-stage pipelined scalar scheduling with forwarding and early branch resolution in ID stage

A

See picture 4S in the ACA album

22
Q

lw $2, BASEA($4)
addi $2,$2,INC1
lw $3, BASEB($4)
addi $3,$3,INC2
addi $4, $4, 4
bne $4,$7, L1

4-issue VLIW with 4 fully pipelined FUs with:
- LD/ST latency 2
- INT latency 1
- INT to BR latency 2

A

See picture 5S in the ACA album

23
Q

lw $2, BASEA($4)
addi $2,$2,INC1
lw $3, BASEB($4)
addi $3,$3,INC2
addi $4, $4, 4
bne $4,$7, L1

2-issue VLIW with 2 fully pipelined FUs with :
- LD/ST latency 2
- INT latency 1
- INT to BR latency 2

A

See picture 6S in the ACA album

24
Q

What is a dependency graph regarding VLIW scheduling

A

They are graphs that capture true, anti, and output dependencies between instructions

Each note represents an operation in the basic block

The connections between notes are related to dependencies

Each note is notated with the longest path to reach it and the node latency

25
Q

What is the longest critical path in a dependence graph?

A

Defines the minimum execution time of a basic block

26
Q

What is the relationship between scheduling and the dependence graph?

A

The scheduling problem consists of assigning the cycle of the execution to each operation in the graph

27
Q

What is a ready state

A

An operation is in the ready state, if all of its predecessors have been scheduled and if the operands are ready

28
Q

How does the list based scheduling algorithm works

A

Before scheduling begins, operations on top of the graph are inserted in the ready state

Starting from the first cycle, for each cycle, try to schedule operations from the ready state that can fill the available resources lots

When more notes are in the ready state, select the operation with the highest priority (longest path the end of the graph)

29
Q

In order to expand parallelism in the static scheduling, what can we do?

A

The compiler must try to expand basic blocks and scheduling instructions across basic blocks

30
Q

What is the difference between local and Global scheduling?

A

Local scheduling techniques operate within a single basic block, while global scheduling techniques operate across basic blocks

31
Q

What loop and unrolling is and within which technique, does it fits?

A

It is a local scheduling technique, where the compiler must test if loop iterations are independent to each other

The loop body is replicated multiple times depending on the unrolling factor, adjusting the loop termination code

32
Q

What are the advantages and disadvantages of loop unrolling

A

Loop overhead is minimized and extends the length of basic blocks, exposing more instructions that can be effectively scheduled to minimize NOP insertions

Disadvantages: increase the register, pressure, the code size, and the cache misses

33
Q

What is loop carried dependencies and loop level analysis?

A

Loop level analysis involves determining what data dependence exist among the operands across the iterations of a loop

Loop carried Dependencies otherwise determines whether data in later iterations are dependent on data values produced in earlier iterations.

34
Q

What is loop peeling

A

Loop peeling is a technique used in optimizing compilers and code transformations to improve the performance or simplify the logic of a loop. This technique involves extracting one or more iterations from the beginning or end of a loop and handling them separately from the main loop body.

35
Q

What is softer pipelining ?

A

It is a technique for reorganizing loops, such that each iteration of the software pipelines code is made from instructions chosen from different iterations of the original loop

It interleaves instructions from different iterations without unrolling the loop

36
Q

What is the global scheduling technique?

A

Global scheduling techniques aims at compacting a code fragment with internal control structures into the shortest possible sequence, preserving data, and control dependencies

37
Q

How does the global scheduling technique works?

A

It tries to find Parrales macros conditional branches through two steps

First, trace selection, and then trace compaction

38
Q

What are super blocks?

A

It is a group of basic blocks with a single entrance and multiple control exits

39
Q

See picture 16 in the ACA album and draw its dependence graph

A

See picture 16S in ACA

40
Q

Look at the dependence graph on picture 17 on the ACA album

Consider that the latencies for ALU, LS, and MUL operations are 1, 2, and 3 respectively, and:

b, c, e, g => ALU ops
f => LS ops
a, d => MUL ops

2 ALUs, 1 LS and 1 MUL

What is the result using the ASAP scheduling algorithm

A

See picture 17S

41
Q

Look at the dependence graph on picture 17 on the ACA album

Consider that the latencies for ALU, LS, and MUL operations are 1, 2, and 3 respectively, and:

b, c, e, g => ALU ops
f => LS ops
a, d => MUL ops

1 ALU, 1 MUL and 1 LS

What is the result using the List based algorithm

A

See picture 18S

42
Q

What are the loop carried dependence’s in the picture 19 in the ACA album?

A

See picture 19S

43
Q

What are the loop carried dependence’s in the picture 20 in the ACA album?

A

See picture 20S

44
Q

What are the loop carried dependence’s in the picture 21 in the ACA album?

A

See picture 21

45
Q

What are the loop carried dependence’s in the picture 21 in the ACA album?

A

See picture 21S