EXAM QUESTIONS Flashcards

Question 1

Q

Explain the difference between the CPU and GPU architecture

Describe what types of applications can benefit more from each one of
these two systems

Which architecture targets the latency and
which targets the throughput and why?

Question 2

Q

Describe parallel implementations of binary and p-ary search

List the advantages and limitations of these two methods.

Compare the step complexity between P-ary and binary search.

Question 3

Q

What is the difference between the speedup and efficiency in the context of
parallel computing?

Question 4

Q

What values correspond to the ideal case of parallelisation and which factors
affect these ideal figures in real parallel applications?

Question 5

Q

Calculate the total speedup for a system running a program consisting of
serial and parallel parts of the code. Consider two cases where the parallel
part is executed on 2 and 20 parallel processors. Assume that the serial part
occupies 20% of the entire code.

Question 6

Q

What is the goal of parallelisation?

Question 7

Q

Explain Moore’s Law and its consequences on the development of parallel hardware.

Answer

A

The number of chips on a transistor doubles approximately every 2 years

Drove optimisation through increasing the speed and power of serial processors

Much easier/cheaper to wait a few years for technology to catch up rather than invest in complex and expensive architectures

More transistors means deeper instruction pipelining, more operations per time period and more complicated instructions

Question 8

Q

Explain the fact that while the transistor count in the processors is still rising, the clock rate trend has flattened since 2004/2005.

Question 9

Q

Can we still improve the performance of our processors using frequency-scaling? Justify your answer.

Question 10

Q

Provide two examples of applications which benefit from parallel computation. Could these problems be solved with standard serial hardware?

Question 11

Q

Name the two independent dimensions used in Flynn’s taxonomy to classify multi-processor computer architectures.

Answer

A

Instruction Steams and Data Streams

Question 12

Q

Provide a brief definition of SIMD and explain what types of problems would benefit the most from SIMD implementations.

Question 13

Q

Compare the pros and cons of shared and distributed memory architectures

Question 14

Q

Which category of Flynn’s taxonomy do GPUs fall in?

Question 15

Q

Is the GPU architecture optimised for low-latency applications or high-throughput applications, and why?

Question 16

Q

What is heterogeneous computing?

Answer

A

Combination of both a latency processor (CPU) and a throughput processor (GPU)

Question 17

Q

What is a parallel pattern?

Question 18

Q

What is a map pattern and its main characteristics? What is the theoretical and
practical complexity of the map pattern?

Question 19

Q

What is the strategy to perform parallel computation on large data inputs with a
limited number of processing units?

Question 20

Q

What is a stencil pattern and how it can be optimised?

Question 21

Q

Explain the reduce pattern and provide two example combiner functions for it.

Question 22

Q

What are the requirements for the combiner functions in the reduce pattern?

Question 23

Q

What is the span and work complexity of the reduce pattern?

Question 24

Q

Why do we need thread synchronisation in the reduce pattern?

Question 25

Q

Explain multi-pass reduction.

Question 26

Q

Explain strategies for combining partial results from reduce operations performed by a number of work groups.

Question 27

Q

What is the difference between exclusive and inclusive scan?

Question 28

Q

Explain the work efficient implementation of the parallel scan pattern

Question 29

Q

Explain the step-efficient implementation of the parallel scan pattern.

Question 30

Q

What is the step and work complexity of both of these scan implementations?

What
problem size is most suitable for each of these implementations?

Question 31

Q

Describe a general approach for scan which can deal with large data sets.

Question 32

Q

Provide an example application of the scan pattern.

Question 33

Q

Describe, how data reorganisation can help with parallelisation.

Question 34

Q

What is the gather pattern?

Question 35

Q

What is the scatter pattern? Why does it require collision resolution?

Question 36

Q

Describe how to implement histogram calculation using atomic functions.

Question 37

Q

Describe the memory hierarchy used in modern parallel hardware. What are the
benefits of using local memory? How is private memory being used?

Question 38

Q

Describe typical thread synchronisation mechanisms used in modern parallel
hardware. What is the difference between local and global synchronisation? What
are the memory barriers for?

Question 39

Q

What are the atomic functions, their applications and limitations?

Question 40

Q

What is the benefit of exploiting coalesced memory access?

Question 41

Q

What is a parallel equivalent of the bubble-sort procedure? How does it work? What
is the step and work complexity of that algorithm?

Question 42

Q

What is a bitonic sequence and how can it be constructed in parallel from an
unordered input sequence?

Question 43

Q

What is a bitonic split? Describe the sorting procedure for bitonic sequences.

Question 44

Q

What is the complexity of bitonic sort?

Question 45

Q

What is a histogram and how its calculation can be parallelised using atomic
functions?

Question 46

Q

How does the input data distribution affect the performance of a parallel histogram
implementation based on atomics?

Question 47

Q

Describe a privatisation algorithm for calculating histograms. What is the difference
between implementations using “per work item” and “per work group” local
histograms? What are the limitations of both approaches?

Question 48

Q

How does the sort-search algorithm for building histograms work? Which
applications can benefit from using this approach?

Question 49

Q

The binary search requires the input data to be sorted. What applications might
justify that additional step?

Question 50

Q

Describe the basic strategies for parallelisation of the search algorithms. How is the memory access pattern in parallelised linear and binary search affecting the performance?

Question 51

Q

How does P-ary search exploit the parallelism?

Question 52

Q

Compare the step complexity between P-ary and binary search.

Question 53

Q

What is the difference between latency and throughput?

Question 54

Q

What is the difference between speedup and efficiency in parallel programs?

Question 55

Q

Which factors influence performance when parallelising sequential problems?

Question 56

Q

Describe how to apply the Amdahl’s Law to parallelisation of computer programs.
What are the consequences of this law on the speedups arising from parallelisation?