5 - Scan Pattern Flashcards
What is a Cumulative Sum operations?
Add all elements and keep all the partial results, output is of the same size as the input
What are the two types of Scan?
Span efficient (Hillis-Steele) and work efficient (Blelloch)
What does the Scan pattern do?
Computes all partial reductions of a collection; every element of the output is a reduction of all the elements of the input up to the position of that output element
What does the Combiner function take as input?
The output of the previous loop iteration as well as the new input value
What is the complexity of a serial Scan?
Work = O(n) Span = O(n)
Discuss the requirements of the Scan operation
Exactly the same as the reduction pattern: function needs to be associative
What happens if the combiner function is non-associative?
The Scan cannot be parallelised and it is then called a general fold pattern
What are two categories of Scan?
Inclusive and exclusive
What is an Inclusive Scan?
Includes the current element in partial reduction
What is an Exclusive Scan?
Excludes the current element in partial reduction
What is the process of Span-Efficient Inclusive Scan?
Form a reduction tree for every output element, then merge the redundant elements of all those trees
What is the complexity of parallel Span-Efficient Scan (Hillis Steele)?
Work = O(n.log n) = worse than setial version Span = O(log n) = span efficient
What does the Span-Efficient scan perform badly on and why?
Large arrays, due to its work-inefficiency
What is the process of Work-Efficient Scan?
Since the combiner function is associative we can reorder operations to reduce span.
Two-phase reordering performing an up-sweep and a down-steep
What is up-sweeping?
Reducing on input/computing the reduction
What is down-sweeping?
Produces the intermediate results/values
How do you traverse in a work efficient scan?
Up-sweep = leaves to roots
Down-sweep = root to leaves
What is the complexity of Work Efficient Scan?
Requires 2 more steps than HS scan
Requires roughly 3 times more operations than serial scan
Span = O(log n) = less efficient than HS
Work = O(n) = less efficient than serial
Which Scan should you use when you have more work than processors?
Work Efficient
Which Scan should you use when you have more processors than work?
Span Efficient
Which Scan should you use with 512 elements and 512 processors?
Hillis/Steele
Which Scan should you use with 1M elements and 512 processors?
Blelloch
Which Scan should you use with 128k elements and 1 processor?
Serial!
How do you deal with large arrays?
The presented algorithms only work for a single block/work group
Use a two stage procedure to accomodate large arrays
What is the Pack Pattern?
Assign boolean keys to all elements in collection, false if an element needs to be filtered out
Perform exclusive scan on keys, result is an array with output indices