Lecture 2: The world of parallelism Flashcards
What is the trend in GPU and CPU usage?
The number of cores is increasing
What is Flynn’s Taxonomy?
Classification system for computer architectures based on number of instruction and data streams they can process simultaneously.
What are the categories of Flynn’s Taxonomy?
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
MIMD: Multiple Instruction, Multiple Data (Chip MPs)
Define the four categories of Flynn’s Taxonomy
SISD: One instruction executed at a time
processing one data element at a time. e.g. Traditional single processors
SIMD: One instruction executed at a time
operating on multiple independent streams of data e.g. Vector processors (1970s), Vector units (MMX, SSE, AVX), GPUs
MIMD: Multiple sequences of instructions executed independently
each one operating on a different stream of data e.g. Chip Multiprocessors
SIMD: Multiple instruction streams but with the same code on
multiple independent streams of data e.g. Data Parallel machines built from independent processors
Describe ineterconnections and communication in parallelism
- Between cores
- Between cores and memory
The way the connections are affects the type of computations
Inerconnections in Chip MPs Advantages and Disadvantages
Advantages: Faster than traditional interconnects so lower cost
Disadvantages: Limited silicon and power for network
What is a grid?
- Direct link to neighbours
- Private on-chip memory
- Staged communication with non-neighbours
- NxN grid worst case: 2*(N-1) steps
What is a torus?
- Direct link to neighbours
- Private on-chip memory
- More symmetrical, more paths, shorter paths
- NxN grid worst case: N steps
- More wires, complex routing
What is the smallest kind of on-chip interconnect?
A grid
In what interconnect do all cores connect four neighbors?
Torus
Which interconnects are suitable for smaller and which for larger systems?
Grid -> Smaller
Torus -> Larger
Bus -> Smaller
Whats a key property of Torus?
They can be generalized further to multiple dimensions:
2D torus 2D grid →
folded 4 neighbours →
3D torus 3D grid →
folded 6 neighbours →
4D torus 4D grid →
folded 8 neighbours →
CMPs rarely go above 2D
Which inerconnect is relied on by many multiprocessors?
A bus, partially or fully
What is a bus?
- All cores to all cores
- Simple to build
- Constant latency
- Memory can be oranized in any way: private to each core or shared between cores
- Time-shared bus (disadvantage)
→ complexity, lesser bandwidth (fraction of that of grid) - Very long wires (to connect all these cores)
→ area, routing, power, slow
For a large number of cores what would be the main bottleneck?
The bus
Which topologies are more suitable for larger systems (scalable)?
1.Trees
2. Hierarchical (Crossbars,
Hypercubes, Rings,
MIN, etc)
Important for high core count
What type of switching is in scalable topologies? Desrcibe it.
Packet switching: Dividing data into small packets for efficient transmission across a network. Can follow different paths to the destination and arrive out of order.
What is a reason for discontuity in core count growth?
Interconnection. Larger counts require more complex network topologies.
What is shared memory? Describe its hardware and software view.
Accessible from every part of the computation.
Hardware: Memory connected to all cores
Software: Global. Accessible from all threads (Reads/Writes)
What is distributed memory? Describe its hardware and software view.
Accessible from only one part of the computation.
Hardware: Memory connected to only one core
Software: Local. Accessible only by the owning thread (Message passing)
What is the software view referred to?
Programming model
What are the types of programming model?
Serial globally accessible memory →
(sw restrictions may apply)
Data Sharing globally accessible memory →
(sw equivalent of Shared Memory)
Message Passing (distributed) thread-owned memory →
(sw equivalent of Distributed Memory)
True or False
Programming model = Memory organisation
False
Match the following memoery with programming model
- Shared memory
- Distributed memory
A. Message passing
B. Data sharing
- B
- A
How efficient is simulating data sharing on distributed memory?
Slow
How efficient is message passing on shared memory?
- Fast but slower than Data Sharing
- Extra traffic might impact bandwidth
Which memory is better for HW perspective?
Distributed Memory.
1. Easier implementation
2. Higher Bandwidth
3. Scales better (E.g. Super computers!)
Which memory is better for SW perspective?
Memory Sharing.
1. Easier programming
2. Works with irregular communication
What is one of the central conflicts of contemporary architecture?
Hardware is complex
What are the software issues of complex hardware?
SW exposed to it (distributed)
→Higher SW cost
→Complicated code
→Wasted energy & cycles
What are the hardware issues of complex hardware?
HW hides it (shared memory)
→Higher HW cost
→Complicated design
→Wasted energy & cycles
What type of memory for chip mp?
Shared
Where is distributed memory used?
Supercomputers
What is the NxN worst case for torus and grid?
Torus -> N
Grid -> 2*(N-1)