- Optimised for law latency computations - Large caches - Fewer ALUs - Good for real-time applications

- Optimised for data-parallel and high throughput computations - Smaller caches - More transistors dedicated to computation - Good if enough work to hide latency

2 - Hardware & Architectures Flashcards by Andrew Smith

What is optimised for low-latency communications?

CPU

How well did you know this?

Not at all

Perfectly

What is optimised for data parallel and high throughout computations

GPU

How well did you know this?

Not at all

Perfectly

Which has the larger cache? CPU or GPU?

CPU

How well did you know this?

Not at all

Perfectly

What has more transistors dedicated to computation? CPU or GPU?

GPU

How well did you know this?

Not at all

Perfectly

What is better for real-time applications? CPU or GPU?

CPU

How well did you know this?

Not at all

Perfectly

Discuss the CPU

Optimised for law latency computations
Large caches
Fewer ALUs
Good for real-time applications

How well did you know this?

Not at all

Perfectly

Discuss the GPU

Optimised for data-parallel and high throughput computations
Smaller caches
More transistors dedicated to computation
Good if enough work to hide latency

How well did you know this?

Not at all

Perfectly

What are 4 pieces of parallel hardware?

Pipelines, vector instructions in CPU
Multi-core CPUs and multi-processors
Dedicated parallel processing units
Distributed systems

How well did you know this?

Not at all

Perfectly

Name 4 applications of parallel computing

Autonomous Cars, Augmented Reality, Video Games and Weather Forecasting

How well did you know this?

Not at all

Perfectly

Why is parallel programming less intuitive than serial programming?

Non-trivial communication and synchronisation

How well did you know this?

Not at all

Perfectly

Why does adding processors not always improve performance of serial programs?

The serial programs are unaware of the existence of multiple processors

How well did you know this?

Not at all

Perfectly

What was single processing power performance driven by?

The increasing density of transistors

How well did you know this?

Not at all

Perfectly

What are parallel programs?

Programs that exploit the power of multiple processors

How well did you know this?

Not at all

Perfectly

What is a shared memory parallel system?

Where processors operate independently but share the same memory

How well did you know this?

Not at all

Perfectly

What is a distributed memory system?

Where the processors operate independently but have their own local memory

How well did you know this?

Not at all

Perfectly

What is pipelining?

Where function units are arranged in stages

How well did you know this?

Not at all

Perfectly

What do we need to think about when writing parallel programs?

Communication among cores, load balancing and synchronisation of cores

How well did you know this?

Not at all

Perfectly

Why can’t we translate serial programs into parallel programs?

Translations cannot deal with the fundamental shift in algorithmic structure required for effective parallelisation

How well did you know this?

Not at all

Perfectly

What is a parallel pattern?

Valuable algorithmic structure commonly seen in efficient parallel programs

How well did you know this?

Not at all

Perfectly

What is serialisation?

Act of putting some set of operations into a specific order

How well did you know this?

Not at all

Perfectly

What is locality?

Memory accesses close together in time and span are cheaper than far apart ons

How well did you know this?

Not at all

Perfectly

What is the difference between implicit and explicit parallelism?

The programmer has no access to implicit parallelism, but they do have access to explicit parallelism

How well did you know this?

Not at all

Perfectly

Name a type of implicit parallelism

Study These Flashcards

Pipelines and vector instructions in CPU

Name 3 types of explicit parallelism

Study These Flashcards

Multi-core CPUS and multi-processors, dedicated parallel processing units, distributed systems

What is Flynn's Taxonomy?

Classifies multi-processor computer architectures according to two independent dimensions, instruction streams and data streams

What are the two possible states of dimensions within Flynn's Taxonomy?

Single and multiple

What are the four possible classifications within Flynn's Taxonomy?

SISD, SIMD, MISD, MIMD

What is SISD?

Single instruction, single data

What is SIMD?

Single instruction, multiple data

What is MISD?

Multiple instruction, single data

What is MIMD?

Multiple instruction, multiple data

Describe SISD

A serial computer, only one instruction stream is being acted on by the CPU during one clock cycle and only one data stream is being used as input during one clock cycle

What is SISD good for?

Traditional single processor/single core CPUS and real-time applications

Describe SIMD

Parallel, all processing units execute the same instruction at a clock cycle, each processing unit can operate on a different data element

What is SIMD good for?

Modern CPUs and GPUs, best for problems with a high degree of regularity e.g image processing

Describe MIMD

Most common type of parallel computer, every processor may be executing a different instruction stream, every processor may be working with a different data stream

What is MIMD good for?

Multi-core CPUs, computing clusters and grids

What is worth noting about MIMD?

Many MIMD architectures also include SIMD execution sub-components

Describe MISD

Each processing unit operates on the data independently via separate instruction streams, a single data stream is def into multiple processing units

What is MISD good for?

CryptographY

Which of Flynn's Taxonomies is rare and uncommon?

MISD

What are the two memory architectures for SIMD and MIMD?

Shared and distributed

What is a shared memory architecture?

Processors operate independently but share the same memory - global address space, changes in memory by one processor are visible to all other processors

What is distributed memory?

Processors operate independently but have their own local memory, memory addresses in one processor do not map to another processor - no global address space

What are the pros of the shared memory architecture?

Global address space is easy to use and program, and data sharing between tasks is fast due to the proximity of memory to CPUs

What are the cons for the shared memory architecture?

Adding more CPUs can increase traffic on shared memory-CPU path and the programmer is responsible for synchronisation to ensure correct access to global memory

What are the pros to the distributed memory architectures?

Each processor can rapidly access its own memory without interference, and memory is scalable (increase the number of processors and size of memory increased)

What are the cons to the distributed memory architecture?

The programmer is responsible for data communication and non-uniform memory access times

What is the GPU?

Designed for manipulating computer graphics and image processing, highly parallel, more efficient than CPUs when processing large blocks of visual data in parallel

What are the different realisations of the GPU?

Dedicated expansion video card, integrated into the CPU, embedded on motherboard

What is GPGPU?

General-purpose computing on GPU. Application of GPU for applications other than graphics, with large datasets and complex computations

When might you use GPGPU?

Physics simulation, AI, weather forecasting

What is latency?

Time to solution. Minimises time at the expensive of power

What is throughout?

Quantity of tasks processed per unit of time, minimises energy per operation

What is heterogeneous computing?

Combining both a latency processor (CPU) with a throughput processor (GPU)

2 - Hardware & Architectures Flashcards

(55 cards)