Retake exam Flashcards
Concurrency
Deal with a lot of things at once
* Processed on one core
Requirement of mutex
- No deadlock
- No starvation
- Thread only for a short time in mutex
IORef
Mutable variable
MVar
Synchronised mutable reference with a lock (empty is locked and full is unlocked)
- No starvation
- Mutual exclusion
compare-and-swap
Mutable exclusion, deadlock freedom
Non-block progress guarantee
B < NB < = OF < LF < WF
* Wait free = every thread makes progress regardless of external factors
* Lock free = system as a whole makes progress, but individual threads aren’t guaranteed to complete
* Obstruction free = Thread makes progress if it doesn’t encounter contention from other threads
STM (Non blocking)
Can solve dining philophers, but can’t solve starvation and contention
TMVar
Thread blocks on an MVar are woken up in FIFO order
Parallelism
Dealing with lot’s of things at once
* Processed on multiple cores
Array of structures / structure of arrays
AOS -> most logical data organisation layout
SOA -> seperate array for each field of structure
Task parallelism
Problem broken down into seperate tasks, can communicate and synchronise with each other
Scheduling
Fork-join, divide-and-conquer
Data-parallelism
Same operation applied to subsets of data
GPU
Inherently data-parallel task
* Task run the same code
* Cheap to create threads
* Uses parallelism to hide latency
* Threads in the same block can cooperate
* Each thread block constitutes an independent data-parallel task
* Each streaming processor has own shared memory
Map
Apply same function to each element of a set
```
~~~
Stencil
Map with access to neighbourhood around element
* Boundary = choose fixed value
* Each thread keeps a copy of neighbours data
* Optimisation = use tiling (previously loaded still in cache
Gather
Performs independent random reads in parallel
* Requires function from output to input
* Matrix transpose
Scatter
Independent random writes in parallel
* Function from input to output
* Collisions = multiple values to the same output index
* Scatter is more expensive than gather
Embarrasingly parallel
Each operation is completely independent from the computation in other threads
Fold
Combine a collection of elements into a single value (maximum, sum, minimum)
* In order to use parallel function must be associative
Scan or prefix
Computes partial reductions of an array
* Inclusive = includes current element in parallel reduction
* Exclusive = Scan includes all prior elements
* Scan-then-propagate = result + number for parallelisation
Parallel scan can split into tiles is done in three phases
Chained scans = use only one kernel and synchronise within the kernel
Work
How to long to execute on a single processor T1
Span
How to long to execute on an infinite number of processors
Efficient / Optimal
Efficient -> span is poly-logarithmic and overhead is poly-logarithmic
Optimal -> span is poly-logarithmic and overhead is constant