P3L1 - Hyperthreading Flashcards
True or False?
The reason why we have to context switch among threads is because the CPU only has one set of registers to describe an execution context.
True
Define hyperthreading
Give 3 other names for hyperthreading - (these names help to explain what hyperthreading is …)
- hardware multithreading
- chip multithreading (CMT)
- simultaneous multithreading (SMT)
Define hyperthreading
Hardware architects have realized they can hide some of the latency associated with context switching.
One of the ways that this has been achieved is to have CPUs with multiple sets of registers.
Each set of registers can describe the context of a separate thread.
Define hyperthreading
Hardware architects have realized they can hide some of the latency associated with context switching.
One of the ways that this has been achieved is to have CPUs with multiple ____________________.
Each set of _________ can describe the _________ of a separate thread.
Hardware architects have realized they can hide some of the latency associated with context switching.
One of the ways that this has been achieved is to have CPUs with multiple sets of registers.
Each set of registers can describe the context of a separate thread.
True or False?
- Some operating system support up to 8 hardware threads.
- Hyperthreading is always enabled as it saves context switch time
- TRUE: Modern platforms often support two hardware threads, though some high performance platforms may support up to eight.
- FALSE: Modern systems allow for hyperthreading to be enabled/disabled at boot time, as there are tradeoffs to this approach.
True or False?
- If hyperthreading is enabled, each hardware contexts appears to the scheduler as an entity upon which it can schedule tasks.
- TRUE - each hardware contexts appears to the scheduler as an entity upon which it can schedule tasks.
One of the decisions that the scheduler will have to make is _______________________________________
If the amount of time a thread is _________ is greater than the amount of time to context switch twice, it makes sense to context switch.
Since a hardware context switch is on the order of cycles and DRAM access is on the order of _________ of cycles, hyperthreading can be used to hide ________________________
One of the decisions that the scheduler will have to make is which two threads to schedule on thesehardware contexts.
If the amount of time a thread is idling is greater than the amount of time to context switch twice, it makes sense to context switch.
Since a hardware context switch is on the order of cycles and DRAM access is on the order of hundreds of cycles, hyperthreading can be used to hide memory access latency.
Paper: Chip Multithreading Systems Need a New Operating System Scheduler, Alexandra Fedorova et al
Abstract: “The unpredictable nature of modern workloads, characterized by frequent branches and control transfers, can result in processor pipeline utilization as low as _______
Chip multithreading (CMT), a processor architecture combining chip ___________ and hardware ________, is designed to address this issue.
Hardware vendors plan to ship CMT systems within the next two years; understanding how such systems will perform is crucial if we are to use them to full advantage. Our simulation experiments show that a CMT-savvy operating system scheduler could improve application performance by a factor of _____.
In this paper we describe our initial analysis of application ___________on CMT systems and propose a __________ for a scheduler tailored for the needs of a CMT system.”
Abstract: “The unpredictable nature of modern workloads, characterized by frequent branches and control transfers, can result in processor pipeline utilization as low as 19%.
Chip multithreading (CMT), a processor architecture combining chip multiprocessing and hardware multithreading, is designed to address this issue.
Hardware vendors plan to ship CMT systems within the next two years; understanding how such systems will perform is crucial if we are to use them to full advantage. Our simulation experiments show that a CMT-savvy operating system scheduler could improve application performance by a factor of two.
In this paper we describe our initial analysis of application performance on CMT systems and propose a design for a scheduler tailored for the needs of a CMT system.”
Some assumptions before analyzing hyperthreaded platform
- A thread can __________________ on every CPU cycle.
- ________________ takes four cycles.
- _____________________________ is instantaneous.
- We have an SMT platform with two __________________________
- A thread can issue an instruction on every CPU cycle.
- Memory access takes four cycles.
- Hardware context switching is instantaneous.
- We have an SMT platform with two hardware threads.
Scheduling a ________ of memory and CPU intensive threads allows us to avoid or at least limit the ________ on the processor pipeline and helps to ensure _________ across both the CPU and the memory components.
Note that we will still experience some __________ due to the interference of these two threads, but it will be minimal relative to the co-scheduling of only memory- or CPU-bound threads.
Scheduling a mix of memory and CPU intensive threads allows us to avoid or at least limit the contention on the processor pipeline and helps to ensure utilization across both the CPU and the memory components.
Note that we will still experience some degradation due to the interference of these two threads, but it will be minimal relative to the co-scheduling of only memory- or CPU-bound threads.
Best way to schedule the two hardware threads?
- Co-schedule two CPU bound threads?
- Co-shedule two memory bound threads?
- Co-schedule one CPU bound and one memory bound thread?
Co-schedule one CPU bound and one memory bound thread
How do we know if a thread is CPU bound or memory bound?
Use historic information. Previous thread behaviour.
But not sleeping … (because memory cycles is not sleeping time..)
So use hardware counters
- L1, L2 … LLC cache misses
- Instructions Per Cycle (IPC) metrics
- Power/Energy usage data
For example, a thread scheduler can look at the number of LLC misses - a metric stored by the hardware counter - and determine that if this number is great enough then the thread is most likely memory bound.
If LLC (last level cache) misses are high the thread is most likely _______________
If LLC misses are high the thread is most likely memory bound
Fedorova speculates that a more concrete metric to help determine if a thread is CPU bound or memory bound is _____________________________
A memory bound thread will take a lot of cycles to complete an instruction; therefore, it has a _________________
A CPU bound thread completes an instruction every cycle, so by definition it has a __________________ of approximately 1.
Fedorova speculates that a more concrete metric to help determine if a thread is CPU bound or memory bound is cycles per instruction (CPI).
A memory bound thread will take a lot of cycles to complete an instruction; therefore, it has a high CPI.
A CPU bound thread completes an instruction every cycle, so by definition it has a low CPI of approximately 1.
True or False?
Fedorova concludes that CPI is a good metric to determine how to co-schedule threads.
False…
Only in the experiment - with distinct CPI values.
But in real world, the CPI values of different tasks are not so distinct.
Better to use LLC