Scans and List Ranking Flashcards
What is the list ranking problem?
Given a linked list and a pointer to the head, compute the position of each node in the list.
The sequential version is trivial, but the trick is to do it faster with parallelism.
What is a prefix sum?
Given an array and an index i, the prefix sum is the sum of all elements from the beginning of the array through i.
What is the prefix max?
Given an array and an index i, the prefix max is the max of all elements from the beginning of the array through i.
What is a scan?
A scan is a generalization of the prefix sum and max, where we can use any operator. So Prefix sum is a scan with the add operator, and prefix max is a scan with the max operator.
Can you parallelize a scan by replacing “for i: A_i = scan_op(A_(i-1), A_i)” with parallel for?
No! Parallel for requires that all iterations are independent, but in a scan, A_i is dependent on A_(i-1)
We can parallelize a reduce operation using a reduction tree. Can we apply this to scan?
Yes, BUT…
We can apply this to scan, but we have to do it like this: “par-for i: B[i] = reduce(A[1:i])”. Which means the work at each iteration is O(i), and total work is O(n^2)! Much worse than sequential algorithm.
Span is only O(log n) though, so there’s that.
How do we parallelize scan?
Assume associativity of our scan operation: (a + b) + c = a + (b + c)
Intuition: First, we scan on each pair of elements in our array, indexes 1 and 2, 3 and 4, etc. We keep only the last element of the result and store in place at the even indexes, so A[2] = scan(A[1], A[2]), A[4] = scan(A[3], A[4]), etc.
Then we recursively run our scan on just the even array indexes and again store them in place.
To get the odd elements starting at index 3, we simply run scan on the previous (even) array index and the odd element itself, storing in place at the odd index. The element at index 1 stays the same.
What is the work of parallel scan?
W(n) = O(n)
To find this, observe that we do scan operations on n/2 pairs of elements in the first step, and n/2 - 1 operations again at the end (when we leave out the first element). In between we have a recursive call on n/2 elements.
This means we first do n/2 + n/2 - 1 + W(n/2) scan operations = W(n/2) + n - 1. If you write out the steps of this recursion you’ll see that work is always O(n). i.e.,
W(n) = W(n/2) + n - 1
W(n) = W(n/4) + n/2 + n - 1
W(n) = W(n/8) + n/4 + n/2 + n - 1 etc etc etc until you get to the base case.
What is the span of parallel scan?
D(n) = O(log^2 n)
NB: log^2(n) = log(n)*log(n), NOT log^2(n) = log(log(n))!!!
To find this, observe that our recursive calls are on half the array each time, leading to log(n) levels to the recursion tree. At each level, we solve the problem using parallel for loops, which individually have log(n) levels. So, log^2(n) span in total. The recursion looks like…
D(n) = O(log n) + D(n/2),
where D(1) = D(0) = O(1).
Describe the Quicksort algorithm
We’re going to sort a given array of length n.
- Pick a random element of the array
- Partition the array so all elements <= pivot value on left, others on right
- Recursively call Quicksort on the partitions
- Stick the partitions back to together
Because the partitions are independent, they can be spawned as independent tasks. But how do you create your partitions in parallel?
How do we use “Conditional Gathers” to do the partitioning step in Quicksort in parallel?
- Allocate a new array the size of the one we’re partitioning
- In parallel, compare each array element to the pivot value and write boolean flags to the new array, 1 for “<= pivot” 0 for otherwise
- Run addScan on the boolean array. This will result in an array that is monotonically nondecreasing. It will increase by one at each array location where a boolean 1 can be found. Importantly, the value at each of these steps can map to the indexes of the output array, and the last value in the scanned array tells us how long the output array should be!
- Allocate an array of size [last value in scan array]
- In a par-for loop, assign each True-flagged element to a location in the output array using the corresponding index in the scan array
What is the “Conditional Gather”/gatherIf(A[1:n], F[1:n])?
It’s the part of the parallel Quicksort partitioning algorithm that takes an array of values A[1:n] and an array of boolean flags F[1:n] and returns a new array with only the flagged values.
What is the work of the parallel Quicksort partition?
W(n) = O(n)
This is because the work happens in a par-for (in the partitioning step), which has W(n) = O(n).
What is the span of the parallel Quicksort partition?
D(n) = O(log n)
This is because the bulk of the work happens in par-for, and par-for has D(n) = O(log n).
Describe a Segmented Scan
In a Segmented Scan, we want to run a scan on subsets of the original array A. To do so, we indicate the subsets using a second boolean array F in which True values tell us that that index is the beginning of a new segment.
Define x_i := (a_i, f_i), so pairs of corresponding array values and flags,
Define op(x_i, x_j) := if not f_j return (a_i + a_j, f_i OR f_j), else return x_j. This means that if f_j == True, return x_j, if f_j == False, we combine x_i and x_j as specified.
Now the actual algorithm:
let X[0:n] := array of n+1 pairs
// Build array X
X[0] = (0, False) //identity operators
for i in 1 to n: X[i] = (A[i], F[i])
// With X set up, we can do our scan
for i in 1 to n: X[i] = op(X[i], X[i-1])
// Extract left value from each tuple in X
for i in 1 to n: A[i] = left(X[i])
return A