P3L1 - Scheduling on Multiprocessors Flashcards
Since the performance of processes/threads is highly dependent on the amount of execution state that is present in the _____ cache - as opposed to main memory - it makes sense that we would want to schedule tasks on to CPUs such that we can maximize how _____ we can keep our _____ caches.
To achieve this, we want to schedule our tasks back on the same CPUs they had been executing on in the past. This is known as cache affinity.
Since the performance of processes/threads is highly dependent on the amount of execution state that is present in the CPU cache - as opposed to main memory - it makes sense that we would want to schedule tasks on to CPUs such that we can maximize how “hot” we can keep our CPU caches.
To achieve this, we want to schedule our tasks back on the same CPUs they had been executing on in the past. This is known as cache affinity.
Define ‘cache-affinity’
Since the performance of processes/threads is highly dependent on the amount of execution state that is present in the CPU cache - as opposed to main memory - it makes sense that we would want to schedule tasks on to CPUs such that we can maximize how “hot” we can keep our CPU caches. To achieve this, we want to schedule our tasks back on the same CPUs they had been executing on in the past. This is known as cache affinity.
To achieve cache affinity, we can have a hierarchical scheduling architecture which maintains a load balancing component that is responsible for ________________________________________
Each CPU then has its own _________ with its own __________, and is responsible for scheduling tasks on that CPU exclusively.
To load balance across the CPUs, we can look at the ________ of each of the runqueues to ensure one is not too much longer than the other.
In addition, we can detect when a CPU is ______, and __________some of the work from the other queues on to the queue associated with the idle CPU.
To achieve cache affinity, we can have a hierarchical scheduling architecture which maintains a load balancing component that is responsible for dividing the tasks among CPUs.
Each CPU then has its own scheduler with its own runqueue, and is responsible for scheduling tasks on that CPU exclusively.
To load balance across the CPUs, we can look at the length of each of the runqueues to ensure one is not too much longer than the other.
In addition, we can detect when a CPU is idle, and rebalance some of the work from the other queues on to the queue associated with the idle CPU.
In addition to having multiple processors, it is possible to have multiple memory nodes. The CPUs and the memory nodes will be connected via some physical interconnect.
In most configurations it is common that a memory node will be closer to a socket of multiple processors, which means that access to this memory node from those processors is ________ than accessing some remote memory node.
We call these platforms ____________________ platforms.
In addition to having multiple processors, it is possible to have multiple memory nodes. The CPUs and the memory nodes will be connected via some physical interconnect.
In most configurations it is common that a memory node will be closer to a socket of multiple processors, which means that access to this memory node from those processors is faster than accessing some remote memory node.
We call these platforms non-uniform memory access (NUMA) platforms.
What is NUMA-aware scheduling?
From a scheduling perspective, what makes sense is to keep tasks on the CPU closest to the memory node where their state is, in order to maximize the speed of memory access. We refer to this as NUMA-aware scheduling.