Scans and List Ranking Flashcards by Robert Lindgren

What is the list ranking problem?

Given a linked list and a pointer to the head, compute the position of each node in the list.

The sequential version is trivial, but the trick is to do it faster with parallelism.

How well did you know this?

Not at all

Perfectly

What is a prefix sum?

Given an array and an index i, the prefix sum is the sum of all elements from the beginning of the array through i.

How well did you know this?

Not at all

Perfectly

What is the prefix max?

Given an array and an index i, the prefix max is the max of all elements from the beginning of the array through i.

How well did you know this?

Not at all

Perfectly

What is a scan?

A scan is a generalization of the prefix sum and max, where we can use any operator. So Prefix sum is a scan with the add operator, and prefix max is a scan with the max operator.

How well did you know this?

Not at all

Perfectly

Can you parallelize a scan by replacing “for i: A_i = scan_op(A_(i-1), A_i)” with parallel for?

No! Parallel for requires that all iterations are independent, but in a scan, A_i is dependent on A_(i-1)

How well did you know this?

Not at all

Perfectly

We can parallelize a reduce operation using a reduction tree. Can we apply this to scan?

Yes, BUT…

We can apply this to scan, but we have to do it like this: “par-for i: B[i] = reduce(A[1:i])”. Which means the work at each iteration is O(i), and total work is O(n^2)! Much worse than sequential algorithm.

Span is only O(log n) though, so there’s that.

How well did you know this?

Not at all

Perfectly

How do we parallelize scan?

Assume associativity of our scan operation: (a + b) + c = a + (b + c)

Intuition: First, we scan on each pair of elements in our array, indexes 1 and 2, 3 and 4, etc. We keep only the last element of the result and store in place at the even indexes, so A[2] = scan(A[1], A[2]), A[4] = scan(A[3], A[4]), etc.

Then we recursively run our scan on just the even array indexes and again store them in place.

To get the odd elements starting at index 3, we simply run scan on the previous (even) array index and the odd element itself, storing in place at the odd index. The element at index 1 stays the same.

How well did you know this?

Not at all

Perfectly

What is the work of parallel scan?

W(n) = O(n)

To find this, observe that we do scan operations on n/2 pairs of elements in the first step, and n/2 - 1 operations again at the end (when we leave out the first element). In between we have a recursive call on n/2 elements.

This means we first do n/2 + n/2 - 1 + W(n/2) scan operations = W(n/2) + n - 1. If you write out the steps of this recursion you’ll see that work is always O(n). i.e.,

W(n) = W(n/2) + n - 1
W(n) = W(n/4) + n/2 + n - 1
W(n) = W(n/8) + n/4 + n/2 + n - 1 etc etc etc until you get to the base case.

How well did you know this?

Not at all

Perfectly

What is the span of parallel scan?

D(n) = O(log^2 n)

NB: log^2(n) = log(n)*log(n), NOT log^2(n) = log(log(n))!!!

To find this, observe that our recursive calls are on half the array each time, leading to log(n) levels to the recursion tree. At each level, we solve the problem using parallel for loops, which individually have log(n) levels. So, log^2(n) span in total. The recursion looks like…

D(n) = O(log n) + D(n/2),
where D(1) = D(0) = O(1).

How well did you know this?

Not at all

Perfectly

Describe the Quicksort algorithm

We’re going to sort a given array of length n.

Pick a random element of the array
Partition the array so all elements <= pivot value on left, others on right
Recursively call Quicksort on the partitions
Stick the partitions back to together

Because the partitions are independent, they can be spawned as independent tasks. But how do you create your partitions in parallel?

How well did you know this?

Not at all

Perfectly

How do we use “Conditional Gathers” to do the partitioning step in Quicksort in parallel?

Allocate a new array the size of the one we’re partitioning
In parallel, compare each array element to the pivot value and write boolean flags to the new array, 1 for “<= pivot” 0 for otherwise
Run addScan on the boolean array. This will result in an array that is monotonically nondecreasing. It will increase by one at each array location where a boolean 1 can be found. Importantly, the value at each of these steps can map to the indexes of the output array, and the last value in the scanned array tells us how long the output array should be!
Allocate an array of size [last value in scan array]
In a par-for loop, assign each True-flagged element to a location in the output array using the corresponding index in the scan array

How well did you know this?

Not at all

Perfectly

What is the “Conditional Gather”/gatherIf(A[1:n], F[1:n])?

It’s the part of the parallel Quicksort partitioning algorithm that takes an array of values A[1:n] and an array of boolean flags F[1:n] and returns a new array with only the flagged values.

How well did you know this?

Not at all

Perfectly

What is the work of the parallel Quicksort partition?

W(n) = O(n)

This is because the work happens in a par-for (in the partitioning step), which has W(n) = O(n).

How well did you know this?

Not at all

Perfectly

What is the span of the parallel Quicksort partition?

D(n) = O(log n)

This is because the bulk of the work happens in par-for, and par-for has D(n) = O(log n).

How well did you know this?

Not at all

Perfectly

Describe a Segmented Scan

In a Segmented Scan, we want to run a scan on subsets of the original array A. To do so, we indicate the subsets using a second boolean array F in which True values tell us that that index is the beginning of a new segment.

Define x_i := (a_i, f_i), so pairs of corresponding array values and flags,

Define op(x_i, x_j) := if not f_j return (a_i + a_j, f_i OR f_j), else return x_j. This means that if f_j == True, return x_j, if f_j == False, we combine x_i and x_j as specified.

Now the actual algorithm:

let X[0:n] := array of n+1 pairs

// Build array X
X[0] = (0, False) //identity operators
for i in 1 to n: X[i] = (A[i], F[i])

// With X set up, we can do our scan
for i in 1 to n: X[i] = op(X[i], X[i-1])

// Extract left value from each tuple in X
for i in 1 to n: A[i] = left(X[i])

return A

How well did you know this?

Not at all

Perfectly

What assumption do we need to make about a scan operation to be able to parallelize it?

Study These Flashcards

The scan operation needs to be associative! Like multiplication, or addition: (a + b) + c = a = (b + c)

What is the goal of list ranking?

Study These Flashcards

List ranking takes a linked list and determines the position (measured as the distance from the head) of each item. This is very straightforward using sequential algorithms. In parallel, it’s tricky.

To formulate list ranking as an addScan problem, what values should we assign to each item in the list?

Study These Flashcards

0 for the first, 1 for everything else. Then addScan will return 0, 1, 2, 3, 4, etc.

What is the key problem in list ranking and how do we solve it?

Study These Flashcards

The linked list data structure! We get around this by turning the linked list into an “array pool”, i.e., one array holding the values of each item, and another array holding the index of the “next” item (replacing the next pointer).

What is an “array pool” (which we use for list ranking)?

Study These Flashcards

An array pool is a data structure we use to parallelize list ranking a linked list.

For a linked list L of size n, we create two new arrays, V and N, of at least size n (can be bigger).

We fill V with the values of the items in L. Note that the order of items in V does not have to correspond to the order in L.

Then for each item value in V, we fill the corresponding location in N with the index of the value that belongs to its “next” item, i.e., the item it points to.

For example, let the first item in L be 5 and the second be 8. Let’s say we put these items at V[4] and V[6], respectively. Then N[4] = 6 (because we V[6] = 8).

Let’s say we create a new array P of the same size as V and N in the array pool. What happens if for every i in range(len(V)), we say…

if N[i] is not null: P[N[i]] = i

Study These Flashcards

We can think of this two ways:

1) P represents the “next” values of the original array pool, but reversed (so a reversed linked list)
2) We use P and N together, making an array representation of a doubly-linked list

What is a “jump” in the context of writing a parallel list ranker?

Study These Flashcards

When you replace a “next” pointer with a pointer to THAT node’s neighbor (like node.next.next).

What happens when you a jump to every node in a linked list (probably represented as an array pool hint hint) at the same time?

Study These Flashcards

You divide the list! You wind up with one list of all the odds and another of all the evens. You can use this for divide and conquer.

How do you add a jump to every node in a list at once, in parallel?

Study These Flashcards

Given an input array of “next” pointers N_in[1:m], create an output array of the same size, N_out[1:m] to avoid collisions.

par-for i in 1 to m: if N_in[i] is not null, N_out[i] = N_in[N_in[i]]

How do we update the ranks in parallel, starting from the initial values we assigned to each item?

The initial values we assigned were 0, 1, 1, 1, etc. and the idea of parallel list ranking is that we can get the rank of each item by running addScan on these initial values. Now we need to do this addScan in parallel without losing any information. To do this we define an "invariance": for node i in sublist V... Sigma_(k=head)_(k=i) V[k] = rank(i), i.e. if we sum the values of the sublist from the head through i, we get the rank of node i in the original list. To maintain this invariance (and this is the real work) we update each node's value by adding the previous node's value to it. We do this before each jump step. After we complete all jumps and each node is in its own sublist, the value of each node is its rank.

For an array pool of size n, what is the max number of jump steps until each sublist is one item?

O(log n), because each jump splits the list in half.

Given a linked list represented as an array pool (V and N), describe the parallel list ranker algorithm

First, we create two arrays of node ranks (because we need a copy to write to to allow us to operate in parallel without overwriting anything). Initialize these both to the 0 1 1 1 etc pattern. Call these R1 and R2. Then we create two arrays of pointer indexes (same reason). Initialize these with the values of input N. Call these N1 and N2. par-for i in 1 to ceiling(log n): updateRanks(R1, R2, N1) jumpList(N1, N2) swap(R1, R2) swap(N1, N2) return R1 or R2, whichever was last updated

What is the work of the parallel list ranker algorithm?

O(n log n) Each of the updateRanks() and jumpList() steps are O(n), since they do roughly one addition per node. Then we do these steps log(n) times. So, n log n.

What is the span of the parallel list ranker algorithm?

O(log^2 n) There are log(n) sequential steps in the loop around updateRanks() and jumpList(), then each of those functions use par-for, which has a span of log(n). So it's O(log n * log n) = O(log^2 n)

Is the parallel list ranker work-optimal?

No! The sequential algorithm (where you just count the number of nodes from the head stepwise and keep track of each node's rank) is O(n), whereas parallel lists ranker is O(n log n). In addition, we need a bunch of extra memory for our two rank arrays and two pointer arrays. Ultimately, you need very long lists to see any speedup from this parallel algorithm.

Scans and List Ranking Flashcards

(30 cards)