VLIW Theory Flashcards by Gabriel Miki

What are the behaviors that the compiler cannot predict?

The cash and the branch behaviors

How well did you know this?

Not at all

Perfectly

Can we have in the bundle more than one branch instruction?

No, we can’t

How well did you know this?

Not at all

Perfectly

What are the requirements of the fetching stage on a multiple issue processor

Higher bandwidth from the instruction cache

How well did you know this?

Not at all

Perfectly

How the decision of the instructions to be issued is taken any multiple issue processor

Either by dynamic scheduling or static scheduling

How well did you know this?

Not at all

Perfectly

What are the bundles in a VLIW architecture?

They are a single issue packet that represents a wide instruction with multiple independent operations

How well did you know this?

Not at all

Perfectly

What is the impact of the fact that in a VLIW architecture there is a shared multi ported register file

The implies in the necessity of having enough ports to read and write the multiple operations issued by each very long single instruction in parallel

Therefore, in the case of a 4 slot bundle we need eight read ports to read eight source registers and four write ports to write four destination register per cycle

How well did you know this?

Not at all

Perfectly

What happens in case there is not enough parallelism in the source code to fill in the available operation slots

NOPs are inserted

How well did you know this?

Not at all

Perfectly

What is the impact of the number of functional unities compared to the number of slots in the bundle, In the architecture

If each slot is assigned to functional unit the decode unit is a simple decoder otherwise, if there are more parallel functional units than the number of issues, the architecture must have a dispatch network to redirect each operation and the related source operands to the target FU

How well did you know this?

Not at all

Perfectly

How does the VLIW processor solves RAW Hazard?

The compiler during the scheduling phase, reorders statically instructions (not involved in the dependency) otherwise the compiler introduces some NOPs

How well did you know this?

Not at all

Perfectly

What information must be exposed to the compiler in order to obtain correct instruction execution?

Operation latencies and data dependencies must be exposed to the compiler

How well did you know this?

Not at all

Perfectly

How are the WAR and WAW hazard solved in the VLIW processors

They are solved statically by the compiler by correctly selecting temporary slots for the operations or by using register renaming

How well did you know this?

Not at all

Perfectly

How are Miss predicted branches solved in the VLIW architecture

A miss predicted branch must be solved dynamically by the hardware by flushing the execution of the speculative instruction in the pipeline

How well did you know this?

Not at all

Perfectly

What needs to be done and what is the effect of avoiding structural hazards accessing the RF and WAR/WAW Hazard

It is necessary to keep in order execution, the write back phase of the parallel operations in a bundle must occur at the same clock cycle

This Implies that each bundle is constrained to the latency of the longer latency operation in it

How well did you know this?

Not at all

Perfectly

What is the implication of doing out of order execution?

These way, we need to check in the write back phase for RF write accesses (avoid structural hazards) and WAR/WAW Hazzard

How well did you know this?

Not at all

Perfectly

What is and what causes register pressure

It is caused by the multicycle latency of the operations together with the multiple issue of instructions that generates an increment of the number of register occupied overtime

How well did you know this?

Not at all

Perfectly

What are the events that the compiler at static time does not know about

Data cache misses and branch miss predictions

How well did you know this?

Not at all

Perfectly

What are basic blocks?

Straight line code sequence with no branches in/out except at the entry/exit point

How well did you know this?

Not at all

Perfectly

What is the relationship between basic blocks, data dependencies and the exploitable ILP in a program?

Beyond the fact that the amount of parallelism available within a basic block is small, data dependencies can further limit to the amount of ILP. We can exploit within a basic block.

Therefore, to obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks

How well did you know this?

Not at all

Perfectly

What are the main advantages of the LIW architecture

Study These Flashcards

Good performance through extensive compiler optimizations (the compiler can analyze the code on a wider instruction window and the compiler has more time to analyze dependencies and extract parallelism)

Reduced hardware complexity

What are the challenges of the LIW architectures

Study These Flashcards

Need for strong compiler technology

Code size increase

Huge number of registers needed for registering renaming

lw $2, BASEA($4)
addi $2,$2,INC1
lw $3, BASEB($4)
addi $3,$3,INC2
addi $4, $4, 4
bne $4,$7, L1

MIPS 5-stage pipelined scalar scheduling with forwarding and early branch resolution in ID stage

Study These Flashcards

See picture 4S in the ACA album

lw $2, BASEA($4)
addi $2,$2,INC1
lw $3, BASEB($4)
addi $3,$3,INC2
addi $4, $4, 4
bne $4,$7, L1

4-issue VLIW with 4 fully pipelined FUs with:
- LD/ST latency 2
- INT latency 1
- INT to BR latency 2

Study These Flashcards

See picture 5S in the ACA album

lw $2, BASEA($4)
addi $2,$2,INC1
lw $3, BASEB($4)
addi $3,$3,INC2
addi $4, $4, 4
bne $4,$7, L1

2-issue VLIW with 2 fully pipelined FUs with :
- LD/ST latency 2
- INT latency 1
- INT to BR latency 2

Study These Flashcards

See picture 6S in the ACA album

What is a dependency graph regarding VLIW scheduling

Study These Flashcards

They are graphs that capture true, anti, and output dependencies between instructions

Each note represents an operation in the basic block

The connections between notes are related to dependencies

Each note is notated with the longest path to reach it and the node latency

What is the longest critical path in a dependence graph?

Defines the minimum execution time of a basic block

What is the relationship between scheduling and the dependence graph?

The scheduling problem consists of assigning the cycle of the execution to each operation in the graph

What is a ready state

An operation is in the ready state, if all of its predecessors have been scheduled and if the operands are ready

How does the list based scheduling algorithm works

Before scheduling begins, operations on top of the graph are inserted in the ready state Starting from the first cycle, for each cycle, try to schedule operations from the ready state that can fill the available resources lots When more notes are in the ready state, select the operation with the highest priority (longest path the end of the graph)

In order to expand parallelism in the static scheduling, what can we do?

The compiler must try to expand basic blocks and scheduling instructions across basic blocks

What is the difference between local and Global scheduling?

Local scheduling techniques operate within a single basic block, while global scheduling techniques operate across basic blocks

What loop and unrolling is and within which technique, does it fits?

It is a local scheduling technique, where the compiler must test if loop iterations are independent to each other The loop body is replicated multiple times depending on the unrolling factor, adjusting the loop termination code

What are the advantages and disadvantages of loop unrolling

Loop overhead is minimized and extends the length of basic blocks, exposing more instructions that can be effectively scheduled to minimize NOP insertions Disadvantages: increase the register, pressure, the code size, and the cache misses

What is loop carried dependencies and loop level analysis?

Loop level analysis involves determining what data dependence exist among the operands across the iterations of a loop Loop carried Dependencies otherwise determines whether data in later iterations are dependent on data values produced in earlier iterations.

What is loop peeling

Loop peeling is a technique used in optimizing compilers and code transformations to improve the performance or simplify the logic of a loop. This technique involves extracting one or more iterations from the beginning or end of a loop and handling them separately from the main loop body.

What is softer pipelining ?

It is a technique for reorganizing loops, such that each iteration of the software pipelines code is made from instructions chosen from different iterations of the original loop It interleaves instructions from different iterations without unrolling the loop

What is the global scheduling technique?

Global scheduling techniques aims at compacting a code fragment with internal control structures into the shortest possible sequence, preserving data, and control dependencies

How does the global scheduling technique works?

It tries to find Parrales macros conditional branches through two steps First, trace selection, and then trace compaction

What are super blocks?

It is a group of basic blocks with a single entrance and multiple control exits

See picture 16 in the ACA album and draw its dependence graph

See picture 16S in ACA

Look at the dependence graph on picture 17 on the ACA album Consider that the latencies for ALU, LS, and MUL operations are 1, 2, and 3 respectively, and: b, c, e, g => ALU ops f => LS ops a, d => MUL ops 2 ALUs, 1 LS and 1 MUL What is the result using the ASAP scheduling algorithm

See picture 17S

Look at the dependence graph on picture 17 on the ACA album Consider that the latencies for ALU, LS, and MUL operations are 1, 2, and 3 respectively, and: b, c, e, g => ALU ops f => LS ops a, d => MUL ops 1 ALU, 1 MUL and 1 LS What is the result using the List based algorithm

See picture 18S

What are the loop carried dependence’s in the picture 19 in the ACA album?

See picture 19S

What are the loop carried dependence’s in the picture 20 in the ACA album?

See picture 20S

What are the loop carried dependence’s in the picture 21 in the ACA album?

See picture 21

What are the loop carried dependence’s in the picture 21 in the ACA album?

See picture 21S

VLIW Theory Flashcards

(45 cards)