L1 & L2 Flashcards
What is the result of adding more transistors to a multiprocessor?
More performance
What are the basic building blocks for integrated circuits?
Transistors
Why is there an increase in number of cores per chip?
- Transistors are decreasing in size and clock frequency increases
What are two effects of having smaller transistors?
- Smaller transistor -> faster switching and less power consumption
True or False
Single processor speed is expected to keep increasing in the forthcoming years
False
Clock speed for single cores has physically reached its limit.
The solution is more cores.
What are the implications of multi-core processors?
Hardware issues:
Memory organization
Connection of processors
Processor type
Software issues:
How to program to allow for parallelism
What is Moore’s law?
Transistor count doubles every 1.5 years.
(Caused by transistor size reduction)
Why do smaller transistors mean faster cores?
They have a faster swtich delay and can be clocked at a higher frequency.
But, a limit has been reached where cooling becomes a problem
What is Dennard scaling? What led to its decline?
Dennard scaling: transistors size reduction results in less power consumption as more as packed on the same chip
Broke down in the mid-2000s mostly due to high current leakage
Explain the difference between Instruction Level Parallelism, Process Level Parallelism and Thread Level Parallelism, outlining an issue introduced by each.
Instruction-level parallelism:
Compiler/hardware automatically parallelises a sequential stream of instructions
Issue: Limits parallelism due to dependencies between instructions
Process-level parallelism:
Process level parallelism does not require much effort from the programmer. Process level parallelism consists in running different applications on different cores.
Issue: increased overhead
Thread-level parallelism:
The programmer divides the program into sequences of instructions in parallel
Issue: data sharing between threads introduces the need for synchronsiation
How is the total execution time calculation done for TLP in:
a. single core
b. multi-core
a. sum of each thread’s execution time
(t1 + t2 + .. + tn)
b. sum of each thread’s execution time / number of threads
( (t1 + t2 + .. + tn) / # of parallel threads)
What can be used for array computations that perform the same computation on its elements?
Data parallelism
What are some examples of data parallelism?
General:
- Matrix multiplication
- fourier transform
Graphics:
- Anti-aliasing
- Texture mapping
- Illumination and shading
Differential equations:
- Weather forecasting
- Engineering simulation
- Financial modelling
What program structure increases the complexity of parallelism?
Program structure with large amounts of multiple-write data sharing
Compare and contrast MPs and Chip MPs in terms of topology, connection, memory, and application.
Topology:
MPs: Discrete elements
Chip MPs: same chip
Connection:
MPs: High-bandwidth network
Chip MPs: On-chip network
Memory:
Both utilize shared memory
MPs may use private memory
Application:
MPs: specific applications (supercomputers, web)
Chip MPs: general purpose
What are the 4 types of architectures in Flynn’s taxonomy?
o SISD: one instruction executed at a time processing one data element at a time
o SIMD: one instruction executed at a time operating on multiple independent streams of data
o MIMD: multiple sequences of instructions executed independently each one operating on a different stream of data
o SPMD: multiple instruction streams but with the same code on multiple independent streams of data
What is an application of MIMD?
Chip multiprocessors
What is an application of SIMD?
Vector’s processors, vector units, GPUs
What is the worst case complexity for communication in a NxN grid?
2*(N-1) steps
What is the worst case complexity for communication in a NxN torus?
N steps
True or False.
In a grid interconnection, all cores have 4 neighbours.
False.
Edge cores of the grid have 2 or 3. In torus, all cores have 4 neighbours.
How many neighbours does a 3D torus core have?
3*2 so 6 neighbours
Explain the 3 options for interconnection.
Grid:
Direct link to neighbours
Private on-chip memory
Staged communication with non-neighbours
Torus:
Direct link to neighbours
Private on-chip memory
More symmetrical, more paths, and shorter paths
Like folding the grid, so all cores have 4 neighbours
Bus:
All cores linked to all cores
Simple to build
Constant latency
Explain the difference between shared and distributed memory from a hardware and software view.
Shared: accessible from every part of the computation
Hardware: Memory connected to all cores
Software: Global; accessible from all threads; reads/writes
Distributed: accessible from only part of the computation
Hardware: Memory connected to only one core
Software: Local; accessible only by the owning thread; message passing