11 - Performance & Optimisation Flashcards
What is implicit with the goal of parallelism?
Optimisation
What are 2 ways to decide which parallelisation strategy is better?
Use theoretical measures
Measure the performance and compare
What is theoretical performance?
Span/step and work complexity
What is the most critical part when parallelising code?
The theoretical performance.
It can give you huge speedup gains
What is latency?
The time it takes to complete a single task
What is throughput?
The rate at which tasks can be complete
What is better, higher or lower latency?
Lower
What is better, higher or lower throughput?
Higher
What does latency minimise?
Time at the expense of power
What does throughput minimise?
Quantity of tasks processed per unit of time
What is optimised for low latency computations, CPU or GPU?
CPU
What is optimised for data-parallel and high throughput computations?
GPU
What has the larger cache? CPU or GPU?
CPU
What is speedup?
Compares the time T for solving the identical problem on one processor versus on p processors
What is the ideal speedup?
Linear
What is the formula for speedup?
S = Ts / Tp
What is efficiency?
Measures the utilisation of hardware resources (return on hardware investment)
What are 2 parallelisation metrics?
Speedup and efficiency
What is the ideal efficiency?
1 (or 100%)
What are 3 sources of performance loss?
Non-parallelisable computation (always small part that is serial)
Overhead (extra effort for communication between processors)
Under-utilisation (idle processors, slow memory)
What is Amdahl’s Law?
The improvement to be gained from using a faster mode of execution is limited by the fraction of time that this mode is used
What is the formula for Amdahl’s law?
s = 1/(1-f) + f/p
What does Amdahl’s Law provide?
A theoretical upper limit on parallel speedup assuming that there are no costs for parallelism
What overheads can be an extra cost of parallelisation?
Communication (shared memory)
Synchronisation (waiting for work to complete, barriers)
Computation (assignment of tasks to processors)
Memory requirements (parallel algorithms typical require more memory)
What is under-utilisation?
Each hardware will have its maximum capabilities
Not utilising hardware resources to their full potential decreases efficiency
What is load imbalance?
Uneven distribution of work to processors
What is power consumption proportional to?
The cube of processor frequency f
What are 4 strategies to gain the best performance and optimisation?
Selection of the right algorithm
Following basic principles for writing efficient code
Architecture specific optimisation
Micro-level optimisation
What is (1-f), f and p in Amdahl’s law?
(1-f) = fraction of serial code (0-1) f = fraction of parallel code (0-1) p = number of processors