L17 Flashcards
What law sets a limit to the number of cores that can be used to achieve a speedup?
Amdahl’s law is a formula that limits the speedup that can be achieved by parallelizing a computation.
What does the Dark Silicon refer to?
Dark silicon is a term used to describe the portion of a chip’s transistors that are not being used at any given time.
It is impossible to keep the whole chip powered on without damaging it.
What are the 3 categories for heterogenous CMPs?
Domain specific accelerators
* Accelerating one very specific domain/type of computation
* Uses specialized hardware
General purpose accelerators
* Accelerating a general class of workloads
* Fully programmable
Asymmetric multi-cores and many-cores
* Cores of different capabilities (heterogeneity in the CPU itself)
* Tightly coupled
What is an example of a domain specific accelerators for heteregenous CMPs?
Snapdragon 8 Gen 2 SoC
What is an example of a general purpose accelerators for heteregenous CMPs?
o on chip GP GPUs
o IBM Cell SPE
o On-Chip FPGAs
o Workloads
o Project Catapult: a reconfigurable fabric for accelerating large-scale datacenter services
What is an example of an assymetrical multi-cores and many-cores for heteregenous CMPs?
ARM big.LITTLE (static asymmetry)
Clusters of big and small cores
OS sends high load tasks to big cores through cluster migration, CPU migration, and global task scheduling
Snapdragon 8 Gen 2: 3 clusters (prime, perf, efficiency)
Intel Alder Lake: high-end, low-end, e-cores, thread director
What is the difference between dynamic and static asymmetry in asymmetric multi-cores and many-cores?
Static asymmetry utilises the same runtime resources to all the cores.
Dynamic asymmetry utilises more runtime resources to some cores.
Why are heterogenous CMPs difficult to use?
Most properties are unknown until after runtime
Unknown capabilities, unknown availability, uknown relative benefit
Are heterogenous CMPs functionally portable? How can functional portability be achieved for low-level and high-level programs?
No, they are not because they use device-specific languages and have device-specific optimisations.
Functional portability can be achieved for low-level programs through OpenCL. A program can be distributed as OpenCL source code, forming a standard layer of compatibility.
For high-level languages,
- C++ with SYCL can be automatically compiled into other languages for different devices.
- Java with TornadoVM uses parallel for/reduce annotations.
True or False.
SYCL, OpenCL and TornadoVM have some performance portability.
False.
OpenCL has no performance portability
How can performance portability be achieved in heterogenous CMPs?
DSLs and library APIs as the new HW/SW contract
De Facto Standards
Intel OneAPI
What are the 4 factors that affect scheduling in heterogenous CMPs?
Communication/Interference:
* Hard constraint: devices need to share memory
* Soft constraints: devices need to be close and compete for shared resources, which is suboptimal to schedule on same cluster
Affinity:
* Different device types makes it extremely difficult to predict scheduling requirements
* Different core types also makes it difficulty to decide which core is implementing what
Thread/task criticality:
* Not all tasks/threads are equally important for overall progress
Energy and power:
* Limited power budged results in adjusting the frequency of cores and some cores might need to be off
* Limited energy budget affects the power and runtime of a core