Parallel Performance Flashcards
What are the four main causes of performance degradation in parallel computing?
- Starvation – Not enough parallel work to keep processors busy
- Latency – Delay in transferring data between system components
- Overhead – Extra work needed for parallel execution (e.g., thread management)
- Waiting – Processes competing for shared resources
Acronym: SLOW
What is parallel speed-up? How is it defined mathematically?
How much faster the parallel program is compared to the serial version.
S{N} = T{N}/T{0}, where T{0} is serial execution time and T{N} is parallel execution time on N processors.
What is parallel efficiency? How is defined mathematically?
Whether speed-up represents efficient use of the resources.
E{N} = S{N}/N, where S{N} is speed-up on N processors.
What is strong scaling?
Keeping the total problem size fixed and increasing the number of processors to reduce execution time.
What does Amdahl’s Law state about parallel performance?
The maximum achievable speed-up is limited by the fraction of the program that cannot be parallelised.
What is the formula for Amdahl’s Law?
S{N} = 1/(s + p/N), where:
- s is the serial fraction
- p is the parallel fraction
- N is the number of processors
What is the theoretical maximum speed-up if infinite processors were available?
S{max} = 1/s = 1/(1−p)
How does Gustafson’s Law challenge Amdahl’s Law?
It assumes that problem size increases with the number of processors, leading to better scaling than predicted by Amdahl’s Law.
What is the formula for Gustafson’s Law?
S{N} = s + pN, where:
- s is the serial fraction
- p is the parallel fraction
- N is the number of processors
What is weak scaling?
Keeping the workload per processor constant while increasing the number of processors, leading to increased total problem size.
Why are barriers used in parallel programming?
To ensure all threads complete their work before proceeding to a critical section (e.g., writing output).
How can implied barriers be removed to improve performance?
Using #pragma omp for nowait
to remove unnecessary waiting while ensuring correctness.
What are the three main loop scheduling strategies in OpenMP?
- (static, chunk) – Assigns equal chunks to threads in a round-robin manner.
- (dynamic, chunk) – Assigns chunks dynamically as threads finish their work.
- (guided, chunk) – Initially large chunks that decrease in size.
Why is load balancing important in MPI programs?
To ensure all processors are utilised effectively and prevent idle time due to uneven work distribution.
What is the role of an interconnect in HPC clusters?
It facilitates communication between compute nodes by carrying MPI messages
What are common interconnect technologies used in HPC clusters?
- Gigabit Ethernet
- Infiniband
What are three key factors affecting MPI message transmission time?
- Number of hops between nodes
- Blocking factor of the network
- Other network traffic
How is message transmission time modelled?
t = L+ M/B, where:
- L is the latency
- M is the message size
- B is the bandwidth
When is latency more important than bandwidth? When is bandwidth more important than latency?
When sending many small messages.
When sending large messages.
How does communication overhead scale in 2D and 3D domain decomposition?
- 2D: R{2D} = 4/N
- 3D: R{3D} = 6/N
As subdomain size decreases, communication overhead increases.
What are examples of parallel overheads in MPI and OpenMP?
- MPI: Extra code for message passing and process synchronisation.
- OpenMP: Thread management, loop scheduling, and synchronisation overhead.
How does Amdahl’s Law change when considering parallel overheads?
The speed-up formula includes the extra overhead term:
S{N} = 1/(s + p/N + (n{p}v)/T{0}), where n{v}v represents parallel overhead as a fraction of serial runtime.
What is super-linear speed-up, and when does it occur?
When speed-up exceeds N due to improved cache utilisation at smaller problem sizes.