Lecture 2: The world of parallelism Flashcards
What is the trend in GPU and CPU usage?
The number of cores is increasing
What is Flynn’s Taxonomy?
Classification system for computer architectures based on number of instruction and data streams they can process simultaneously.
What are the categories of Flynn’s Taxonomy?
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
MIMD: Multiple Instruction, Multiple Data (Chip MPs)
Define the four categories of Flynn’s Taxonomy
SISD: One instruction executed at a time
processing one data element at a time. e.g. Traditional single processors
SIMD: One instruction executed at a time
operating on multiple independent streams of data e.g. Vector processors (1970s), Vector units (MMX, SSE, AVX), GPUs
MIMD: Multiple sequences of instructions executed independently
each one operating on a different stream of data e.g. Chip Multiprocessors
SIMD: Multiple instruction streams but with the same code on
multiple independent streams of data e.g. Data Parallel machines built from independent processors
Describe ineterconnections and communication in parallelism
- Between cores
- Between cores and memory
The way the connections are affects the type of computations
Inerconnections in Chip MPs Advantages and Disadvantages
Advantages: Faster than traditional interconnects so lower cost
Disadvantages: Limited silicon and power for network
What is a grid?
- Direct link to neighbours
- Private on-chip memory
- Staged communication with non-neighbours
- NxN grid worst case: 2*(N-1) steps
What is a torus?
- Direct link to neighbours
- Private on-chip memory
- More symmetrical, more paths, shorter paths
- NxN grid worst case: N steps
- More wires, complex routing
What is the smallest kind of on-chip interconnect?
A grid
In what interconnect do all cores connect four neighbors?
Torus
Which interconnects are suitable for smaller and which for larger systems?
Grid -> Smaller
Torus -> Larger
Bus -> Smaller
Whats a key property of Torus?
They can be generalized further to multiple dimensions:
2D torus 2D grid →
folded 4 neighbours →
3D torus 3D grid →
folded 6 neighbours →
4D torus 4D grid →
folded 8 neighbours →
CMPs rarely go above 2D
Which inerconnect is relied on by many multiprocessors?
A bus, partially or fully
What is a bus?
- All cores to all cores
- Simple to build
- Constant latency
- Memory can be oranized in any way: private to each core or shared between cores
- Time-shared bus (disadvantage)
→ complexity, lesser bandwidth (fraction of that of grid) - Very long wires (to connect all these cores)
→ area, routing, power, slow
For a large number of cores what would be the main bottleneck?
The bus