2 - Hardware & Architectures Flashcards
What is optimised for low-latency communications?
CPU
What is optimised for data parallel and high throughout computations
GPU
Which has the larger cache? CPU or GPU?
CPU
What has more transistors dedicated to computation? CPU or GPU?
GPU
What is better for real-time applications? CPU or GPU?
CPU
Discuss the CPU
- Optimised for law latency computations
- Large caches
- Fewer ALUs
- Good for real-time applications
Discuss the GPU
- Optimised for data-parallel and high throughput computations
- Smaller caches
- More transistors dedicated to computation
- Good if enough work to hide latency
What are 4 pieces of parallel hardware?
- Pipelines, vector instructions in CPU
- Multi-core CPUs and multi-processors
- Dedicated parallel processing units
- Distributed systems
Name 4 applications of parallel computing
Autonomous Cars, Augmented Reality, Video Games and Weather Forecasting
Why is parallel programming less intuitive than serial programming?
Non-trivial communication and synchronisation
Why does adding processors not always improve performance of serial programs?
The serial programs are unaware of the existence of multiple processors
What was single processing power performance driven by?
The increasing density of transistors
What are parallel programs?
Programs that exploit the power of multiple processors
What is a shared memory parallel system?
Where processors operate independently but share the same memory
What is a distributed memory system?
Where the processors operate independently but have their own local memory
What is pipelining?
Where function units are arranged in stages
What do we need to think about when writing parallel programs?
Communication among cores, load balancing and synchronisation of cores
Why can’t we translate serial programs into parallel programs?
Translations cannot deal with the fundamental shift in algorithmic structure required for effective parallelisation
What is a parallel pattern?
Valuable algorithmic structure commonly seen in efficient parallel programs
What is serialisation?
Act of putting some set of operations into a specific order
What is locality?
Memory accesses close together in time and span are cheaper than far apart ons
What is the difference between implicit and explicit parallelism?
The programmer has no access to implicit parallelism, but they do have access to explicit parallelism
Name a type of implicit parallelism
Pipelines and vector instructions in CPU
Name 3 types of explicit parallelism
Multi-core CPUS and multi-processors, dedicated parallel processing units, distributed systems
What is Flynn’s Taxonomy?
Classifies multi-processor computer architectures according to two independent dimensions, instruction streams and data streams
What are the two possible states of dimensions within Flynn’s Taxonomy?
Single and multiple
What are the four possible classifications within Flynn’s Taxonomy?
SISD, SIMD, MISD, MIMD
What is SISD?
Single instruction, single data
What is SIMD?
Single instruction, multiple data
What is MISD?
Multiple instruction, single data
What is MIMD?
Multiple instruction, multiple data
Describe SISD
A serial computer, only one instruction stream is being acted on by the CPU during one clock cycle and only one data stream is being used as input during one clock cycle
What is SISD good for?
Traditional single processor/single core CPUS and real-time applications
Describe SIMD
Parallel, all processing units execute the same instruction at a clock cycle, each processing unit can operate on a different data element
What is SIMD good for?
Modern CPUs and GPUs, best for problems with a high degree of regularity e.g image processing
Describe MIMD
Most common type of parallel computer, every processor may be executing a different instruction stream, every processor may be working with a different data stream
What is MIMD good for?
Multi-core CPUs, computing clusters and grids
What is worth noting about MIMD?
Many MIMD architectures also include SIMD execution sub-components
Describe MISD
Each processing unit operates on the data independently via separate instruction streams, a single data stream is def into multiple processing units
What is MISD good for?
CryptographY
Which of Flynn’s Taxonomies is rare and uncommon?
MISD
What are the two memory architectures for SIMD and MIMD?
Shared and distributed
What is a shared memory architecture?
Processors operate independently but share the same memory - global address space, changes in memory by one processor are visible to all other processors
What is distributed memory?
Processors operate independently but have their own local memory, memory addresses in one processor do not map to another processor - no global address space
What are the pros of the shared memory architecture?
Global address space is easy to use and program, and data sharing between tasks is fast due to the proximity of memory to CPUs
What are the cons for the shared memory architecture?
Adding more CPUs can increase traffic on shared memory-CPU path and the programmer is responsible for synchronisation to ensure correct access to global memory
What are the pros to the distributed memory architectures?
Each processor can rapidly access its own memory without interference, and memory is scalable (increase the number of processors and size of memory increased)
What are the cons to the distributed memory architecture?
The programmer is responsible for data communication and non-uniform memory access times
What is the GPU?
Designed for manipulating computer graphics and image processing, highly parallel, more efficient than CPUs when processing large blocks of visual data in parallel
What are the different realisations of the GPU?
Dedicated expansion video card, integrated into the CPU, embedded on motherboard
What is GPGPU?
General-purpose computing on GPU. Application of GPU for applications other than graphics, with large datasets and complex computations
When might you use GPGPU?
Physics simulation, AI, weather forecasting
What is latency?
Time to solution. Minimises time at the expensive of power
What is throughout?
Quantity of tasks processed per unit of time, minimises energy per operation
What is heterogeneous computing?
Combining both a latency processor (CPU) with a throughput processor (GPU)