EXAM QUESTIONS Flashcards

1
Q

Explain the difference between the CPU and GPU architecture

Describe what types of applications can benefit more from each one of
these two systems

Which architecture targets the latency and
which targets the throughput and why?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe parallel implementations of binary and p-ary search

List the advantages and limitations of these two methods.

Compare the step complexity between P-ary and binary search.

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between the speedup and efficiency in the context of
parallel computing?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What values correspond to the ideal case of parallelisation and which factors
affect these ideal figures in real parallel applications?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Calculate the total speedup for a system running a program consisting of
serial and parallel parts of the code. Consider two cases where the parallel
part is executed on 2 and 20 parallel processors. Assume that the serial part
occupies 20% of the entire code.

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the goal of parallelisation?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain Moore’s Law and its consequences on the development of parallel hardware.

A

The number of chips on a transistor doubles approximately every 2 years

Drove optimisation through increasing the speed and power of serial processors

Much easier/cheaper to wait a few years for technology to catch up rather than invest in complex and expensive architectures

More transistors means deeper instruction pipelining, more operations per time period and more complicated instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the fact that while the transistor count in the processors is still rising, the clock rate trend has flattened since 2004/2005.

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can we still improve the performance of our processors using frequency-scaling? Justify your answer.

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Provide two examples of applications which benefit from parallel computation. Could these problems be solved with standard serial hardware?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name the two independent dimensions used in Flynn’s taxonomy to classify multi-processor computer architectures.

A

Instruction Steams and Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Provide a brief definition of SIMD and explain what types of problems would benefit the most from SIMD implementations.

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Compare the pros and cons of shared and distributed memory architectures

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which category of Flynn’s taxonomy do GPUs fall in?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Is the GPU architecture optimised for low-latency applications or high-throughput applications, and why?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is heterogeneous computing?

A

Combination of both a latency processor (CPU) and a throughput processor (GPU)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a parallel pattern?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a map pattern and its main characteristics? What is the theoretical and
practical complexity of the map pattern?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the strategy to perform parallel computation on large data inputs with a
limited number of processing units?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a stencil pattern and how it can be optimised?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Explain the reduce pattern and provide two example combiner functions for it.

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the requirements for the combiner functions in the reduce pattern?

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the span and work complexity of the reduce pattern?

A

x

24
Q

Why do we need thread synchronisation in the reduce pattern?

A

x

25
Q

Explain multi-pass reduction.

A

x

26
Q

Explain strategies for combining partial results from reduce operations performed by a number of work groups.

A

x

27
Q

What is the difference between exclusive and inclusive scan?

A

x

28
Q

Explain the work efficient implementation of the parallel scan pattern

A

x

29
Q

Explain the step-efficient implementation of the parallel scan pattern.

A

x

30
Q

What is the step and work complexity of both of these scan implementations?

What
problem size is most suitable for each of these implementations?

A

x

31
Q

Describe a general approach for scan which can deal with large data sets.

A

x

32
Q

Provide an example application of the scan pattern.

A

x

33
Q

Describe, how data reorganisation can help with parallelisation.

A

x

34
Q

What is the gather pattern?

A

x

35
Q

What is the scatter pattern? Why does it require collision resolution?

A

x

36
Q

Describe how to implement histogram calculation using atomic functions.

A

x

37
Q

Describe the memory hierarchy used in modern parallel hardware. What are the
benefits of using local memory? How is private memory being used?

A

x

38
Q

Describe typical thread synchronisation mechanisms used in modern parallel
hardware. What is the difference between local and global synchronisation? What
are the memory barriers for?

A

x

39
Q

What are the atomic functions, their applications and limitations?

A

x

40
Q

What is the benefit of exploiting coalesced memory access?

A

x

41
Q

What is a parallel equivalent of the bubble-sort procedure? How does it work? What
is the step and work complexity of that algorithm?

A

x

42
Q

What is a bitonic sequence and how can it be constructed in parallel from an
unordered input sequence?

A

x

43
Q

What is a bitonic split? Describe the sorting procedure for bitonic sequences.

A

x

44
Q

What is the complexity of bitonic sort?

A

x

45
Q

What is a histogram and how its calculation can be parallelised using atomic
functions?

A

x

46
Q

How does the input data distribution affect the performance of a parallel histogram
implementation based on atomics?

A

x

47
Q

Describe a privatisation algorithm for calculating histograms. What is the difference
between implementations using “per work item” and “per work group” local
histograms? What are the limitations of both approaches?

A

x

48
Q

How does the sort-search algorithm for building histograms work? Which
applications can benefit from using this approach?

A

x

49
Q

The binary search requires the input data to be sorted. What applications might
justify that additional step?

A

x

50
Q

Describe the basic strategies for parallelisation of the search algorithms. How is the memory access pattern in parallelised linear and binary search affecting the performance?

A

x

51
Q

How does P-ary search exploit the parallelism?

A

x

52
Q

Compare the step complexity between P-ary and binary search.

A

x

53
Q

What is the difference between latency and throughput?

A

x

54
Q

What is the difference between speedup and efficiency in parallel programs?

A

x

55
Q

Which factors influence performance when parallelising sequential problems?

A

x

56
Q

Describe how to apply the Amdahl’s Law to parallelisation of computer programs.
What are the consequences of this law on the speedups arising from parallelisation?

A

x