Intro To The Work-Span Model Flashcards

Question

What is weak-scalability?

Answer 1

P = O(W*/D) implies W*/P = Omega(D), i.e., the work per processor must grow with the span, which in turn depends on n, the problem size. We use this to get ideal scaling.

Answer 2

A keyword that tells the compiler or runtime system that the target is an independent unit of work, i.e., may be executed asynchronously. The target is either a function or procedure call.

Answer 3

A keyword that tells the compiler or runtime system that the program can only continue after all spawns have finished their work, i.e., when everyone's caught up! To which spawns does sync apply? In this class we'll assume sync always applies to any spawn that has occurred on the same stack frame.

Answer 4

A stack data structure that stores information about the active subroutines of a computer program. Its main purpose it to keep track of the point to which each active subroutine should return control when it completes execution. A call stack is composed of stack frames.

Answer 5

These are the components of the call stack. Each stack frame corresponds to an active subroutine; it is a data structure that contains state information about its corresponding subroutine.

Answer 6

Say you spawn two processes that put some values in variables A and B, then return A+B. The return call will evaluate A+B first and store it in a temporary variable and only sync afterwards. However, we need to sync BEFORE the addition. If we don't, then we can't guarantee that the values in A and B are the values we computed in our spawned processes! They might be junk, or they might be values previously stored to those variables.

Answer 7

Basically, yes! If we assume spawns and syncs are O(1) (which is often true in practice), then we can ignore them and do the usual sequential analysis. The recurrence for work W(n) will look like the recurrence for the sequential analysis T(n)! The recurrence for span will be different, but not too bad.

Answer 8

1) Work-optimality: Achieve work that matches the best sequential algorithm, i.e. O(n) if at all humanly possible. 2) Poly-logarithmic span (or "low" span): D(n) = O(log^k n). Because our work goal is O(n) and log n grows more slowly than n, this means is that the average available parallelism grows with n. Note: this is because span is DEPTH, so basically a good parallel algorithm doesn't grow too much deeper as the amount of work increases.

Answer 9

Parallel-for, or for-any or for-all. A kind of loop where all iterations are independent of one another. You can execute in any order. In DAG terms, this creates n independent sub-paths. Like executing n spawns simultaneously. At the end of the par-for, there is an implicit sync point.

Answer 10

If each iteration is bounded by a constant cost, then work is O(1) for each of them and O(n) all together. In theory, span would be constant. However, span actually depends on implementation. If we implement par-for with a for loop that spawns each iteration, then span is O(n)! That's terrible! A more realistic implementation would be to create a procedure call that spawns iterations of the loop, and call it recursively while using divide and conquer. This results in the standard binary tree recursion gives us a span of O(log n). Better! Assume the O(log n) span implementation for this class.

Answer 11

When a data race leads to an error. NOTE: Data race does not always lead to an error! So data race does not necessarily imply a race condition.

Answer 12

FROM WHAT I CAN TELL, it means to convert an array to a single value, most commonly by summing its elements. A lot of programming languages seem to have a reduce() method for arrays (Swift, Javascript, etc). In Swift, for example, arr.reduce(0, +) will start with a value of 0, then combine that with the first element of arr using 0+arr[0], then combine that with arr[1], until it has summed all elements in arr. Presumably this could be done with any other binary operator, though I'm not confident about that.

Answer 13

In this class we will do at least two things to make pseudocode more compact: 1) Slicing: Instead of looping over some variable and using that to index an array, we can just use slicing a la Python, like, arr[1:n], or arr[:] for all elements of array. We could express the element-wise summation of two arrays by z[:] = x[:] + y[:]. 2) Temporary arrays are implied: Instead of writing them out, we might do something like the following. t[:] <- A[i,:]*x[:] y(i) <- y(i) + reduce(t) becomes... y(i) <- y(i) + reduce(A[i,:]*x[:]) ...where we recognize that there is an implicit temp array inside the reduce. 3) We should also recognize that something like t[:] <- A[i,:]*x[:] is easily parallelizable with par-for. Use the work and span for par-for here.

Answer 14

A good algorithm has low "mass", or fewer operations, and has short and wide. i.e. lots of parallelism, little work, low span. A bad algorithm has a lot of operations and is tall and skinny. So a lot of work, large span, and little parallelism.

Answer 15

It should usually be the first thing you try!

Intro To The Work-Span Model Flashcards

(39 cards)