Lecture 8 - Multithreaded Processors Flashcards
Sources of Thread Level Parallelism
Multiprogrammed workloads - many independent applications active simultaneously. No synchronisation necessary.
Commercial server workloads - multi-user and multi-process or native multithreaded applications.
Parallel applications - explicitly parallel applications.
Within a program - automatically identify threads in attempt to boost performance of a single program.
Hardware needed for fast context switch
Need to replicate the hardware context for each thread.
Coarse-grain multithreading
Switch to an alternative thread when the currently active thread stalls, such as on a cache miss.
Coarse-grain multithreading - process of switching threads
Context switch occurs in response to a dynamic event which may not be detected until late in the pipeline.
Instructions following the instruction that has caused the thread switch must be flush.
Instructions from new thread then fetched and executed or obtained from thread-switch buffer. Typical thread-switch incurs a 3 cycle penalty.
Methods of reducing the thread-switch penalty
Provide pipeline registers for each thread at each pipeline stage so thread can stall without need to flush pipeline. Alternate thread can be switched back without pipeline bubble.
Fine-grained multithreading
A new thread is selected on every clock cycle.
Advantages of fine-grained multithreading
Can exploit static thread schedule.
Guarantees provided by simple round-robin schedule can be exploited to simplify implementation such as removing need to detect and resolve inter-instruction dependences, and hides memory latency.
Predictable performance provided.
Single-thread performance - severely limited as we always switch to new thread.
Dynamic thread selection policies
Avoid switching to a thread if it is currently stalled.
Simultaneous Multi-threading
Permit instructions from many different threads to be in-flight simultaneously.
Multithreading - impact on memory system
Protection issues:
Add thread ID to TLB entry and cache tag
Threads may share common address space.
Cache Pressures:
Many active threads require larger working set.
Positive and negative cache interference
Load/store queues:
Hardware detects and resolves memory dependencies
Need to be thread aware in SMT processor
Simpler to duplicate queues