Questions Flashcards

1
Q

Explain the different instruction set architectures (stack, accumulator, register-memory, and register-register).

A

By internal storage:

  • accumulator architecture - operands are implicitly in accumulator
  • stack architecture - operands are implicitly on top of the stack
  • general-purpose register architecture - only explicit operands

By accessing memory and registers:

  • register-memory architecture - any instruction can access memory
  • load-store architecture - load and store instructions to access memory
  • memory-memory architecture - not used anymore, all operands in memory

One special type is also extended accumulator or special-purpose register computers - more registers than just accumulator

Currently mostly used are general purpose register architectures - registers are faster than memories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give examples of addressing modes. Which are used in a typical RISC?

A

Acutal address is called effective address

Modes:

  • register - value is in register
  • immidiate - constants, operand is actual value
  • displacement - memory[constant + regValue]
  • register indirect - memory[regValue]
  • indexed - memory[reg1Value + reg2Value]
  • direct/absolute - memory[constant]
  • memory indirect - memory[memory[regValue]]
  • autoincrement/autodecrement - memory[regValue = regValue + d/regValue = regValue - d]
  • scaled - memory[constant + reg1Value + reg2Value*d]

regValue - value in register
reg1Value, reg2Value - multiple registers value
d - displacement

immidiate or displacement both with 12-bit wide fields
register indirect - placing 0 to the 12-bit displacement
absolute adressing - using x0 as base register

byte addressable memory
load-store architecture
aligned and unaligned memory accesses - unaligned access may run extremly slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the simple 5-stage MIPS/RISC-V pipeline.

A

Consists of 5 stages - IF, ID, EX, MEM and WB.
In the first stage we fetch the instruction from the instruction memory. It also then increases the program counter by 4 to be ready to read another instruction
Second stage is responsible for reading the register file in order to access the proper registers for source operands. In this stage we also sign extend the immidiate operands.
Third stage is execute, in this stage ALU either computes the effective address if accessing the memory, or performs computation
Special case are branches and jumps which are evaluated and their addresses are calculated in ALU
In memory stage, pipeline reads data from the memory
The last stage is responsible for writing back to the registers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a structural hazard? Could there be one in the 5-stage MIPS/RISC-V pipeline? How is it resolved?

A

structural hazards - hardware cannot support overlapping multiple instructions

 - in modern systems occurs only in slow special purpose units
 - can easily be solved by compiler constructors

If we would use the same memory for instructions and for data we might need to access the memory in instruction fetch phase and another instruction which is in memory read stage. Both would then access the memory which is not supported. It is resolved by using seperate memories for instructions and for data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a control hazard? Solutions to avoid/reduce the penalty?

A

Control hazard results from branches. Unconditional branches might be easier to resolve since we can fetch the target address. On the other hand conditional branches might or might not be taken. This can lead to the fact that we may load incorrect instruction and we then subsequently need to clear the pipeline and start the execution again.
Penalty in this case is of the size of the pipeline.
This is also called freeze or flush. Freeze stalls the pipeline until branch is resolved.
Another solution is to predict the branch as always take or always untaken.
Delayed branch strategy picks a sequential successor and also branch target if taken.

More advanced solutions are branch predictors.They can be either static or dynamic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is there a dependency between pipelining and ISA?

A

Instruction sets are influenced by pipelining. For example exceptions and branches greatly influence the ISA.
Especially problematic are instructions that change the state of the processor during execution (difficult to handle precise exceptions).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a direct mapped, set associative, and full associative
cache?

A

Direct mapped cache - blocks are always stored in the same place according to their address - usually address mod number of blocks. Leads to conflict misses.
Set associative - cache is divided into parts which then contain blocks. These sets are then calculated in similar ways as in direct mapped cache and then these sets are fully associative. Again leads to conflict misses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between write-through and write-back cache?

A

write through - information is written both to the cache and lower-level memory
write back - information is written only to the cahce and then written to the main memory when replaced - uses dirty bit

Write back writes are at speed of the cache memory and uses less bandwidth and less energy - interesting for embedded systems and servers and multiprocessor systems
Write through is easier to implement - always clean memory, no need to write to lower level memories. Simplifies data coherency. Multilevel caches -> write through more viable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is loop unrolling? Who performs it?

A
  • Loop unrolling
    ○ Unrolls loop multiple times to adjust loop termination code
    ○ Usually uses more resources (e.g. more registers)
    ○ Can greatly increase performance of the code, but can lead to much larger size of the code
    ○ Scheduling of the unrolled loop increases the performance even more
    ○ Process of unrolling:
    1. Determine that unrolling is actually useful, iterations must be independent
    2. Use different registers to avoid unnecessary constraints
    3. Eliminate extra tests and branches and adjust loop termination and iteration code
    4. Determine if the loads and stores from different iterations are independent and can be interchanged
    1) Requires analyzing the memory accesses if they access different locations
    5. Schedule the code, preserving any dependences needed to yield the same result as the original code
    ○ Limits of improvement gain:
    1. Decrease in the amount of overhead amortized with each unroll
    2. Code size limitations
    1) Can lead to more cache misses
    2) Shortage of regsiters - register pressure
    3. Compiler limitations

Compiler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the performance equation extended to include the cache? Are the parameters independent?

A

???

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is loop unrolling? Who performs it?

A
  • Loop unrolling
    ○ Unrolls loop multiple times to adjust loop termination code
    ○ Usually uses more resources (e.g. more registers)
    ○ Can greatly increase performance of the code, but can lead to much larger size of the code
    ○ Scheduling of the unrolled loop increases the performance even more
    ○ Process of unrolling:
    1. Determine that unrolling is actually useful, iterations must be independent
    2. Use different registers to avoid unnecessary constraints
    3. Eliminate extra tests and branches and adjust loop termination and iteration code
    4. Determine if the loads and stores from different iterations are independent and can be interchanged
    1) Requires analyzing the memory accesses if they access different locations
    5. Schedule the code, preserving any dependences needed to yield the same result as the original code
    ○ Limits of improvement gain:
    1. Decrease in the amount of overhead amortized with each unroll
    2. Code size limitations
    1) Can lead to more cache misses
    2) Shortage of regsiters - register pressure
    3. Compiler limitations

Compiler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain branch prediction and the branch history table.

A

The goal of branch prediction is to reduce the branch penalty. Branch predictions are trying to resolve if the branch might be taken or not and reduce the stalls of the processor.
simplest version - branch-prediction buffer, or branch history table
- simple buffer of lower bits of addresses with bit that indicates if the branch there was taken
- could be modified also by different branch with the same lower bits of address
- if branch was not taken, then bit is reversed to 0
- improvement is using 2 bit prediction schemes - prediction must miss twice before it changes - 11 taken, 10 taken, 00 untaken, 01 untaken
- also more bits can be used, but practice shows that 2 bits are enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain branch prediction and why one wants to use a 2-bit predictor.

A

The goal of branch prediction is to reduce the branch penalty. Branch predictions are trying to resolve if the branch might be taken or not and reduce the stalls of the processor.
The main power of the 2-bit predictor are its accuracy and simplicity. In simple branch predictors we might use 1-bit to record history. This might lead to unwanted mispredictions. 2-bit predictor requires two incorrect predictions to change. More general scheme is an n-bit predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a branch target buffer?

A
  • Branch-target buffers
    ○ Stores branch target addresses
    ○ Sends out the address to PC before decoding, prediction must be correct
    ○ If PC of the next instruction matches one from the buffer, then predicting PC is used as next PC
    ○ If matching entry is found in the buffer, fetching begins immediately
    ○ Matching must be correct, otherwise we get a wrong PC and will worsen the performance
    ○ Stores only predicted-taken branches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sketch the hardware for and the data flow of the Tomasulo Algorithm.

A

Page 198

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the function of the reservation station in an out-of-order processor?

A
  • Register renaming
    ○ Resolves WAR and WAW hazards
    ○ Renames destination registers including also pending read or write for an earlier instruction
    ○ Out-of-order write does not affect any isntruction
    • Renaming is done at reservation stations
      ○ Buffer operands of waiting instructions
      ○ Station fetches and buffers operands for later use
      § Eliminates need to use registers
      ○ When successive writes to a register overlap, only last one is put into the register
      ○ When issuing instruction, register specifiers are renamed to the names of the reservation stations
      ○ There can be more reservation stations than actual registers
      § Eliminates hazards that cannot be eliminated by data hazards
      ○ Reservation stations have two implications
      § Hazard detection and execution control are distributed
      § Results are passed directly from reservation stations (buffering)
      □ Use of CDB - common data bus
      ○ Reservation stations hold issued instructions that wait to be issued to the functional units
    • Reservation station structure:
      ○ Qp - operation to perform on operands S1 and S2
      ○ Qj, Qk - reservation stations that will produce the corresponding source operand, zero value means that operand is ready in Vj or Vk, or is unnecessary
      ○ Vj, Vk - Value of source operands. Only one of the fields V or Q is used for the operand. For loads, Vk is used to hold the offset field
      ○ A - used to hold information for memory address calculation. After address calculation, effective address is stored in here
      ○ Busy - indicates that this reservation station and its accompanying functional units are occupied
    • Register file field
17
Q

What additional stage from out-of-order execution is needed to enable speculation?

A
  1. Commit
    i. Normal commit - instruction reaches head of ROB and its result is present in the buffer - write register
    ii. Store commit - similar, but result is written in the memory instead of register
    iii. Branch with incorrect prediction - ROB is flushed and execution is restarted at the correct successor
    If branch is correctly predicted, branch is finished
18
Q

What is a thread (for computer architects)?

A

Lightweight version of a process
Own PC, own processor state, same address space
quick to switch

19
Q

Explain the difference between course-grain, fine-grain, and simultaneous TLP (Thread Level Parallelism).

A
  1. Fine-grained multithreading - switching between threads each clock cycle
    i. Usually in round-robin fashion
    ii. Can hide instruction stalls, during switch instruction from other thread might execute
    iii. Slows down execution of individual threads
    iv. Trades throughput for latency
    2. Coarse-grained multithreading - switch only on costly stalls
    i. Does not require as efficient switches
    ii. Less likely to slow down execution of one thread
    iii. On thread switch produces bubbles
    iv. Reducing high-cost stalls
    v. Not used very often
    3. Simultaneous multithreading - arises naturally on fine-grained multithreading arising from dynamically scheduled multi-issue processor
    i. Register renaming and dynamic scheduling allow for multiple instructions from independent threads to be processed simultaneously, dependencies solved by the dynamic scheduler
20
Q

What are the two communication models for multiprocessors?

A

Symmetric shared-memory multiprocessors
Distributed shared-memory multiprocessors

Symmetric shared-memory
- small to moderate number of cores

21
Q

What is cache coherency and consistency?

A

Cache coherency deals with reads and writes in the same memory location while consistency deals with reads and writes in terms of other locations.