Lecture 4: Shared Memory Multiprocessors, Lecture 6: MESI and MOESI Cache Coherence, Lecture 7: Directory-Based Cache Coherence Flashcards
How does the structure of multiprocessors differ for shared and distributed memory?
Shared memory: all cpus have access to the same memory
Distributed memory: each cpu has its own local memory
Whhich is more prevalent in mutliprocessors, shared or distributed memory, and why? Are there any drawbacks?
Shared memory because its easier to program. Hardware is more complex
What problem does hardware complexity of shared memory multiprocessors lead to?
bus-based cache coherence systems that do not scale well
What are caches?
Fast and small local memory holding recently used data and instruction. Can have different levels (L1, L2 (local), L3 (shared))
Main memory cannot keep up with processor speed
What are two cache functions?
- Fetch data from RAM on cache misses
- Write modified data back to RAM
What is the cache coherency problem?
Inconsistency of shared data across multiple caches in multi-core systems. Can occur when one core updates a value in its cache, leaving outdated copies in other caches.
What is the solution to the cache coherency problem?
cache to cache communication with bus snooping
for performance to avoid involving the slow memory
How does bus snooping work in maintaining cache coherency in multiprocessor systems?
Hardware attached to each core’s cache (one bus). Observes all transactions on the bus and able to modify the cache independently of the core. This hardware can take action on seeing pertinent transactions on the bus
True or false
Cache has 2 control bits for each line it contains, indicating its state
True
How many cache states are there?
Three:
1. Modified
2. Invalid
3. Shared
- Modified state
- Invalid
- Shared
A. Implicit. A valid cache entry exists and the line has the same values as main memory. Several caches can have the same line in that state
B. there may be an address match on this line but the data is not valid. We must go to memory and fetch it or get it from another cache
C. the cache line is valid and has been written to but the latest values have not been updated in memory yet. A line can be in the modified state in at most 1 core
- C
- B
- A
Which of the following states are legal?
a. modified invalid
b. modified modified
c. shared shared
d. invalid shared
e. modified shared
f. invalid invalid
a, c, d, f
What are the aspects of state transitions?
- messages
- access made to main memory
- state changes
What are the messages sent between caches?
- Read messages: 1 core request a cache line from another
- Invalidate messages: 1 core asks another to invalidate one of its cache lines
Describe the state transitions from the following state
modified invalid
Read on core 1: cache hit, served from cache
Write on core 1: cache hit, served from cache
Read on core 2: Overall change to state: shared/shared
Write on core 2: Overall state changes to (a’): invalid/modified
Describe the state transitions from the following state
invalid invalid
Read on core 1: Go to (c’) shared/invalid
Read on core 2: symmetry - goes to state (c): invalid/shared
Write on core 1: State goes to (a) modified/invalid
Write on core 2: symmetry - goes to state (a’): invalid/modified
Describe the state transitions from the following state
invalid shared
Read on core 1: Overall state goes to (d): shared/shared
Read on core 2: Cache hit, stays in (c): invalid/shared
Write on core 1: Overall state goes to (a): modified/invalid
Write on core 2: Overall state goes to (a’): invalid/modified
Describe the state transitions from the following state
shared shared
Read on core 1 or 2: cache hit, served from the cache
Write on core 1: Overall state goes to: modified/invalid
Write on core 2: symmetry, state goes to (a’): invalid/modified
Describe state transitions beyond 2 cores
Snoopy bus messages are broadcasted to all cores
1. Any core with a valid value can respond to a read request
2. Upon receiving an invalidate request:
Any core in S invalidates without writeback
A core in M writes back then invalidates
What are the snooping protocols?
Write-Invalidate:
When a core updates a cache line, other copies of that line in other caches are invalidated
Future accesses on the other copies will require fetching the updated line from memory/other caches
Most widespread protocol, used in MSI (this lecture), MESI, MOESI (next video)
Write-update:
What is a major implication of chache coherence?
All cores must always see exactly the same state of a location in memory
If one core writes and broadcast invalidate: No other core must be able to perform a read/write to that location
All cores must see the invalidate at the same time, i.e. within the same bus cycle
The coherence protocol is a major limitation to the number of cores that can be supported
True or false
Multiple cores can use the bus at a time
False
True or false
Invalidate always needs to be broadcasted with a write
False
Sometimes other cores are already invalid
Describe MESI
Split the S state into:
E: exclusive
Switch to E after a read causing a fetch from memory
S: (truly) shared
Switch to S after a read that gets value from another cache
Describe MOESI
Split the M state into two:
Modified: not in sync with memory only copy
Owned: not in sync with memory, other valid copies in S
Owner has exclusive rights to make changes
Broadcast the changes to the shared copies. No memory writeback needed
Writeback only when data in O or M is evicted
True or false
Even with optimisations such as MESI and MOESI, bus-based systems can’t scale to large multiprocessor counts
True
Describe directory structure
Each directory entry has:
1. Present bitmap: which core has a copy
2. Dirty bit: only one owner
Each line in caches also valid and dirty bits
What is the bottleneck of directory based coherence? How can it be solved?
Central directory.
Distribute directory and cache it e.g. NUMA, Non-Uniform Memory Access, more than one directory each with its own address space
NUMA drawbacks
- Slower communications
- Long/variable delays requires handshakes
True or false
Directory-based coherency, improved on MESI MOESI, and is an optimal solution.
False
scaling to large numbers of cores is still a problem