Closed Exercises Flashcards
Let’s consider a single-issue processor that can manage up to 4 simultaneous threads.
What are the values of the ideal CPI and the ideal per-thread CPI? (SINGLE ANSWER)
1 point
• Answer 1: Ideal CPI = 1 & Ideal per-thread CPI = 0.25
• Answer 2: Ideal CPI = 0.25 & Ideal per-thread CPI =0.25
• Answer 3: Ideal CPI = 0.5 & Ideal per-thread CPI = 2 Answer 4: Ideal CPI = 0.5 & Ideal per-thread CPI = 1
• Answer 5: Ideal CPI = 1 & Ideal per-thread CPI = 4
Answer 5: Ideal CPI = 1 & Ideal per-thread CPI = 4
I need to manage the number of threads devised by the number of issues
Let’s consider a dual-issue SMT processor that can manage up to 4 simultaneous threads.
What are the values of the ideal CPI and the ideal per-thread CPI? (SINGLE ANSWER)
1 point
• Answer 1: Ideal CPI = 1 & Ideal per-thread CPI = 0.25
• Answer 2: Ideal CPI = 0.25 & Ideal per-thread CPI =0.25
• Answer 3: Ideal CPI = 0.5 & Ideal per-thread CPI = 2
• Answer 4: Ideal CPI = 0.5 & Ideal per-thread CPI = 1
• Answer 5: Ideal CPI = 0.25 & Ideal per-thread CPI = 4
Answer 3: Ideal CPI = 0.5 & Ideal per-thread CPI = 2
Threads decided by the number of issues
Let’s consider a 4-issue SMT processor that can manage up to 4 simultaneous threads.
What are the values of the ideal CPI and the ideal per-thread CPI? (SINGLE ANSWER)
1 point
• Answer 1: Ideal CPI = 1 & Ideal per-thread CPI = 0.25
• Answer 2: Ideal CPI = 0.25 & Ideal per-thread CPI =0.25
• Answer 3: Ideal CPI = 0.25 & Ideal per-thread CPI = 1
• Answer 4: Ideal CPI = 0.25 & Ideal per-thread CPI = 4
• Answer 5: Ideal CPI = 4 & Ideal per-thread CPI = 0.25
Answer 3: Ideal CPI = 0.25 & Ideal per-thread CPI = 1
Threads devided by the number of issues
What is the Flynn’s taxonomy?
• SISD: Single instruction stream, single data stream
- Uniprocessors (including scalar processors like MIPS, but also ILP processors such as superscalars)
• SIMD: Single instruction stream, multiple data streams - Vector architectures
- Multimedia extensions
- Graphics processor units
• MISD: Multiple instruction streams, single data stream - No practical usage => no commercial implementation
• MIMD: Multiple instruction streams, multiple data streams - Tightly-coupled MIMD (with thread-level parallelism)
- Loosely-coupled MIMD (with request-level parallelism)
What is the idea of SIMD and its advantages
Central controller send the same instruction to multiple processing elements (PEs) for multiple data.
• Synchronized PEs with single Program Counter
• Each Processing Element (PE) has its own set of data
– Use different sets of register addresses
• Motivations for SIMD:
– Cost of control unit shared by all execution units
– Only one copy of the code in execution is necessary
What are the types of SIMD machines?
Vector architectures
SIMD extensions
Graphics Processor Units (GPUs)
How does a Vector architecture works?
Basic idea:
– Load sets of data elements into vector registers
– Operate on vector registers
– Write the results back into memory
What is the difference between scalar and vector registers?
See picture 23
What are GPUs specialized in
GPUs are specialized for parallel intensive computation
How does GPU parallelize its computations
All operations are performed in parallel by the GPU using a huge number of threads processing all data independently
How does the interaction between CPUs and GPUs occur?
The GPU (device) serves as a coprocessor for the CPU (host)
CPU and GPU are separate devices, with separate memory space addresses
The GPU has its own high-bandwidth memory
Serial parts of a program run on the CPU (host)
Computation-intensive and data-parallel parts are offloaded to the GPU (device)
What is the main bottleneck in the CPU and GPU interaction?
Data movement between CPU and GPU is the main bottleneck
- Low bandwidth with respect to internal CPU and GPU since it exploits PCI Express (12-14GB/s)
- Relatively high latency
- Data transfer can take more than the actual
What is operation chaining
Concept of forwarding extended to vector registers:
– A vector operation can start as soon as each element of its
vector source operand become available
– Even though a pair of operations depend on one another,
chaining allows the operations to proceed in parallel on separate elements of the vector.
– In this way, we don’t need anymore to wait for the last element of a load to start the next dependent instruction.
What does the execution time of vector architectures depend on
• Execution time depends on three factors:
– Length of operand vectors (number of elements)
– Structural hazards (how many vector functional units)
– Data dependencies (need to introduce operation chaining)
• VMIPS functional units consume one element per clock cycle
– So, the execution time of one vector instruction is approximately given by the vector length
What are convoys?
• Simplification: to introduce the notion of convoy
– Set of vector instructions that could potentially execute
together partially overlapped (no structural hazards)
• Sequences with read-after-write dependency hazards can
be in the same convoy via chaining