Lecture 11 Flashcards
Communication cost?
The effect operations have on program execution time (or some other metric, e.g. power)
Total communication cost = num_messages * (communication time - overlap)
Overlap: portion of communication performed concurrently with other work.
C2CR?
Communication to computation ratio:
Amount of communication / amount of computation
Need low ratio to efficiently utilise modern parallel processors since the ratio of compute capability to available bandwidth is high.
Inherent vs artifactual communication?
Inherent communication: information that fundamentally must be moved between processors to carry out the algorithm given the specified assignment (assumes unlimited capacity caches, minimum granularity transfers, etc…)
Artifactual communication: all other communication (artifactual communication results from practical details of system implementation).
How to improve temporal locality?
A program exhibits temporal locality if it tends to access the same memory location repeatedly in a short time-frame.
Goal: structure an algorithm so that its working sets map well to the sizes of the different levels of the hierarchy. Keeping working sets small without losing performance for other reasons.
To keep working sets small, assign tasks that tend to access the same data to the same process.
Once assignment is done, a process’s assigned computation can be organised so that tasks that access the same data are scheduled close to one another in time. We reuse a set of data as much as possible, before moving to other data.
How to improve C2CR?
Exploit sharing: co-locate tasks that operate on the same data which reduces inherent communication.
Schedule threads working on the same data structure at the same time on the same processor.
Contention?
A resource can perform operations at a given throughput (number of transactions per unit time). Memory, communication links, servers, etc…
Contention occurs when many requests to a resource are made within a small window of time (the resource is a “hot spot”).
Can make a contention vs latency trade off by introducing redundancy.
How to reduce communication cost?
Reduce contention. Replicate contended resources (local copies, fine-grained locks). Stagger access to contended resources.
Increase overlap. The application writer can use asynchronous communication. Hardware implementor can use pipelining, multi-threading, pre-fetching, out-of-order.
Requires additional concurrency in application (more concurrency than number of execution units).’
Reduce overhead to sender/receiver. Send fewer messages, make messages larger. Coalesce many small messages into large ones.
Reduce delay. Application writer can restructure code to exploit locality. Hardware implementor can improve communication architecture.