Performance, Objectives Flashcards

Question 1

Q

Tseq, Tpar

Answer

A

Tseq(n): Time for 1 processor to solve P(n) using Seq

Theory: Number of instructions (or other critical cost measure) executed in the worst case for inputs of size n
The number of instructions carried out is the required work
Practice: Measured time (or other parameter) of execution over some inputs (experiment design)

Tpar(p,n): Time for p processors to solve P(n) using Par, time for last/slowest processor to finish

Question 2

Q

Speed-up

Answer

A

The gain in moving from sequential computation with algorithm Seq to parallel computation with algorithm Par is expressed as the speed-up of Par over Seq:

Sp(n) = Tseq(n)/Tpar(p,n)

Goal: Achieve as large speed-up as possible (for some n, all n) for as many processors as possible

Absolute speed-up: worst-case times, best-possible is linear in p

Question 3

Q

absolute Speed-up

Answer

A

The gain in moving from sequential computation with algorithm Seq to parallel computation with algorithm Par is expressed as the speed-up of Par over Seq:

Sp(n) = Tseq(n)/Tpar(p,n)

Goal: Achieve as large speed-up as possible (for some n, all n) for as many processors as possible

best-possible is linear in p

Perfect speed-up Sp(n) = p cannot be exceeded

Question 4

Q

Work

Answer

A

•The work of a sequential algorithm Seq on input of size n is the total number of operations (integer, FLOP, memory, …; that which matters most, according to model) carried out when Seq runs to
completion on n
•The work of a parallel algorithm Par on input of size n is the total number of operations carried out by all assigned processors, not including idle times
•The work required for some problem P is the work by a best possible algorithm (complexity of P)

Question 5

Q

work optimal

Answer

A

Parallel algorithm is called work-optimal if Wpar(p,n) = O(Tseq(n)). A work-optimal algorithm has potential for linear speed-up (for some number of processors)

For work-optimal algorithms, absolute and relative speed-up coincides (asymptotically), since Tpar(1,n) = O(Tseq(n))

Question 6

Q

cost

Answer

A

Product C(n) = pTpar(p,n) is the cost of the parallel algorithm, total time in which the p processors are reserved

Parallel algorithm is called cost-optimal if C(n) = O(Tseq(n)). A cost-optimal algorithm has linear (perhaps perfect) speed-up

Question 7

Q

Break-even

Answer

A

Break-even:

How many processors are needed for parallel algorithm to be faster than sequential algorithm?

Question 8

Q

relative speed-up

Answer

A

Definition (Relative speed-up): The ratio
SRelp(n) = Tpar(1,n)/Tpar(p,n) is the relative speed-up of algorithm Par. Relative speed-up expresses how well Par utilizes p processors (scalability)

Relative speed-up not to be confused with absolute speed-up. Absolute speed-up expresses how much can be gained over the best (known/possible) sequential implementation by parallelization.

An algorithm has good scalability and relative speed-up if Tpar(1,n)/Tpar(p,n) = Θ(p)

Question 9

Q

parallelism of the algorithm

Answer

A

Definition:
T∞(n): The smallest possible running time of parallel algorithm Par given arbitrarily many processors. Per definition T∞(n) ≤ Tpar(p,n) for all p. Relative speed-up is limited by
SRelp(n) = Tpar(1,n)/Tpar(p,n) ≤Tpar(1,n)/T∞(n)

Definition:
The ratio Tpar(1,n)/T∞(n) is called the parallelism of the parallel algorithm Par
The parallelism is the largest number of processors that can be employed and still give linear, relative speed-up: Assume
Tpar(1,n)/T∞(n)<p></p>

Question 10

Q

parallel overhead

Answer

A

Parallel overhead is the work that does not have to be done by a sequential algorithm
• Communication: Exchanging data, keeping data consistent
• Synchronization: Ensuring that processors have reached the same point in the computation (typically SPMD programs)
• Algorithmic: Extra or redundant computations

(Communication) Overheads for processor i often modeled as Toverhead(p,ni) = α(p) + βni
where α(p) is the latency (dependent on p), and βni the cost per data item that needs to be exchanged by processor i. For synchronization operations, ni = 0

Question 11

Q

Granularity

Answer

A

The (smallest) time between coordination periods is called the granularity of the parallel
computation

coarse- or fine-grained depending on the number of instructions between coordination intervalls

Question 12

Q

load balancing

Answer

A

Difference between max (Ti(n)+Oi(n)) and min
(Ti(n)+Oi(n)) is the load imbalance

O overhead

Load balancing: Achieving for all processors, i, j, an even amount of work, Tpar(i,n) ≈ Tpar(j,n)

Static, oblivious: splitting the problem into p pieces, regardless of the input (except its size n)
Static, problem dependent, adaptive: splitting the problem into p pieces, using the (structure of) the input
Dynamic: dynamically readjusting the work assigned to processors. Entails overheads

Question 13

Q

Amdahl’s Law (parallel version)

Answer

A

Amdahl’s Law (parallel version):

Let a program A contain a fraction r that can be “perfectly” parallelized, and a fraction s=(1-r) that is “purely sequential”, i.e., cannot be parallelized at all (s and r independent of n). The maximum achievable speed-up is 1/s, independently of n

Question 14

Q

scaled speed-up

Answer

A

Speed-up as function of p and n, with sequential and parallelizable times t(n) and T(n) is termed scaled speed-up

Depending on how fast t(n)/T(n) converges, linear speed-up can be achieved by increasing problem size n accordingly

Small t(n) relative to T(n) means large parallelism

Question 15

Q

efficiency

Answer

A

The efficiency of parallel algorithm Par is the ratio of best possible parallel time to actual parallel time for given p and n

E(p,n) = (Tseq(n)/p) / Tpar(p,n) = Sp(n)/p = Tseq(n) / (p Tpar(p,n))

E(p,n) ≤ 1, since Sp(n) = Tseq(n)/Tpar(n,p) ≤ p
E(p,n) = c (constant, ≤1): linear speed-up
Cost-optimal algorithms have constant efficiency

Question 16

Q

scalability

Answer

A

A parallel algorithm/implementation is strongly scaling if Sp(n) = Θ(p) (linear, independent of (sufficiently large) n)

A parallel algorithm/implementation is weakly scaling if there is a slowly growing function f(p), such that for n = Ω(f(p)), E(p,n) remains constant. The function f is called the iso-efficiency function

Question 17

Q

Summary: Stating parallel performance

Answer

A

It is convenient to state parallel performance and scalability of a
parallelalgorithm/implementation as Tpar(p,n) = O(T(n)/p+t(p,n))

T(n) represents the parallel part, t(p,n) the non-parallel part of the
algorithm beyond which no improvement is possible, regardless of how many processors are used. The parallelism is 1+T(n)/t(p,n)

The work of the algorithm is W = O(p(T(n)/p+t(p,n))) = O(T(n)+pt(p,n))

The algorithm is work-optimal when T(n) is O(Tseq(n)) and pt(p,n) is O(Tseq(n))

Question 18

Q

Scalability analysis

Answer

A

Strong scaling: Keep problem size (work) fixed, increase number of processors. Algorithm/implementation is strongly scaling, if Tpar(p,n) decreases proportionally to p (linear speed-up).
Weak scaling (alternative definition): Keep average work (work per processor) fixed, that is increase problem size together with number of processors. Algorithm/implementation is weakly scaling if the running time remains constant (=Tseq(n’) for non-scaled input of size n’). Let K = Tseq(n’), then the input size scaling function is n = Tseq-1(pK) = g(p)