Memory consistency Flashcards
What is memory consistency?
A problem that occurs between different programs when you have multiple cores active.
A contract between the programmer and the hardware.
Is a set of rules that dictates how observable memory operations are performed. This means, that when you have designed a program, how does it translate to actual memory execution.
The programmer must be aware of this set of rules when creating programs.
Hardware designers must also follow these rules when designing out-of-order cores.
Consistency is NOT optional, and is not an optimisation feature.
Consistency models are NOT the same between different ISAs, meaning it is an inate part of the ISA and dictates how a program can e restructured during execution.
What is the difference between cache coherence and memory consistency?
Memory consistency is a different multi-core problem, whereas coherence deals with making sure the memory has the correct value.
Consistency is NOT optional, and is not an optimisation feature.
Who must follow the memory consistency rules?
Hardware designers when designing ooo cores
Programmers when creating programs
Why is memory consistency needed?
Ensures we have synchronisation between programs executing in parallel. When programs are executing across cores, we do not know exactly when things are happening. Memory consistency helps avoiding problems such as race conditions.
Ensures ability to execute atomic operations.
What would happen to programs if we did not have memory consistency?
Programs would suffer from race conditions more, and ambiguity.
What is observability in memory systems?
When executing a program on a core in a multicore system, some memory operations will be visible to other threads.
What memory operations are visable?
Shared cache line:
- readings from the cache line is visable
All writes are visible as it changes memory values. Even if it is not in a cache line that is shared.
What difference in the goals of a SW and HW designer?
A SW designer wants to write programs that are correct and optimised. A typical optimisation is introducing parallelism, which comes with new problem areas.
A HW designer wants to execute memory operations when these are ready, but not necessarily in order (long latency, etc.). This means that memory operations might not execute in the program code order.
What is memory order?
Order of executed observable memory operations in the actual core.
What is program order?
Order of instructions as they appear in the machine code.
Defines the programmers intention for program execution/purpose. This means that if a programmer puts a load before a write, they expect the load to happen before the write.
What limits the reordering of memory operations?
Consistency rules and ready instructions.
An instruction will execute when it is ready, if it is legal according to the consistency rules.
What is the Sequential Consistency model (SC)
The baseline
Memory operations are performed in program order.
When having multiple processes, these are ordered in respect to each other - in some arbitrary order.
Within a program, all operations are executed in order. And all programs are ordered in regards to each other, in some arbitrary order.
No processes are performing memory operations at the same time.
Only one valid ordering (translation from program to execution order), when considering a single program/thread. And that is the program order.
What is sequential consistency at a local level?
Every operation executes as ordered in the program code
What is sequential consistency at a global level?
There are some ordering of processes in regards to each other
How can you quickly define a memory consistency model?
The translation from program order to execution order.
What are ordering rules?
When talking about consistency models, we have the 4 following rules:
Read followed by read
Read followed by write
Write followed by read
Write followed by write
Notation:
R0, R1: 0 - first operation, 1 - second operation
<p: Before in program order
<m: Before in memory order
What are ordering rules for sequential consistency?
R0 <p R1 -> R0 <m R1
R0 <p W0 -> R0 <m W0
W0 <p R0 -> W0 <m R0
W0 <p W1 -> W0 <m W1
No reordering of execution. A rule says observable operations must happen in program order.
Why is talking about SC important?
It is trivial ti implement
- all mem-ops are in program order
- all updates become fully visable before a value is read
Very intuitiv, it does what the programmer “expects” will happen
Very low performance - very limited when we can execute mem-ops. Not just when they are ready.
When talking about consistency models, what are valid outcomes?
The possible outcomes of a program, based on the consistency model.
What happens if we relax one or more of the rules, as they are defined in the SC model?
Reduces intuition of going from program to memory order. Can cause wrong outputs or race conditions.
Potentially improves performance. Introduces the ability to potentially issue operations earlier on (when ready).
What is a Total Store Order (TSO) consistency model?
Relaxing the Write-Read rules. Maintains the store order.
W0 <p R0 -> W0 <m R0 # This does no longer need to apply
Allows for following reads to issue before preceding writes - assuming no dependence.
What is a Partial Store Order (PSO) consistency model?
Relaxes both Write-Read, and Write-Write rules. No longer guarantee for writes happening in order.
W0 <p R0 -> W0 <m R0
W0 <p W1 -> W0 <m W1
Race conditions can occur here
What is a Weak Ordering (WO) / Release Consistency (RC) consistency model?
Relaxes all of the rules. Give no guarantees of the execution ordering of memory operations.
Any ordering is legal, unless explicitly synchronised
There are multiple variations of WO, this is just the generic name
What consistency model is most wide-spread in personal computers?
TSO and x86 ISA
Why would we want to relax the Write-Read ordering?
Remove the W0 <p R0 -> W0 <m R0 rule
Allows reads to be issued, while writes to different addresses have not yet completed.
Effective way to hide write latency, as writes are not on the critical path. Reads are however, because they fetch new data into the system, whereas writes handles old data.
Note: If W0 <p W1 is still enforced, not all write latency can be hidden.
Why would we want to relax Write-Write ordering?
Can hide more write latency as writes can be issued while writes to different addresses have not yet completed.
What is the advantage of PSO?
Both write-read and write-write are relaxed
Can now empty stores from buffer in any order
Must still generate all older addresses to prevent issues with aliasing
Why would we want to relax read-write and read-read?
Allows for even more MLP
These are most effective when other rules are also abolished
What is necessary when we also relax read-write and read-read?
Explicit synch is always necessary for coordinated program execution
Why is a weaker consistency model better for hardware design?
Can have more MLP, but makes it difficult for the programmer
What are memory barriers?
Special operations that enforce an ordering within the CPU
Serialises all memory instructions before and after the barrier
All mem-ops before the barrier completes before the barrier instruction complete. No mem-ops after the barrier are initiated before the barrier has completed
What are some downsides with introducing barriers?
Force synchronisation - don’t have as liberal ordering as before, so less possible performance benefits (MLP).
They typically force FULL serialisation, which can result in the SC model
What is Release Consistency (RC)?
Part of group of Weak Orderings (WOs)
Have more fine grained primitives. Can synchronise around specific operations - the acquire and release operations.
These operations are barriers that enforce specific ordering.
With release and acquire, we get a new, bigger set of rules.
In RC what does Acquire do?
Getting permission from other processors for subsequent memory accesses
Previous memory accesses can be overlapped, but the next memory access has to follow a release from somewhere else.
In RC what does Release do?
Giving permission to other processors for previous memory accesses.
Subsequent memory accesses can be overlapped
What is a trend in programming models, in regards to correctness?
The programming model guarantee correct execution for data-race free programs
However, if a program contains race conditions, there are no guarantees for execution. Meaning the programmer must implement explicit synchronisation themselves to achieve correctness.
Why do we need synchronisation?
It is central in all kinds of parallelism
- synch access to resources
- order events from cooperating processes correctly
- used for shared memory programming
How is synchronisation implemented in smaller multiprocessor systems?
Using uninterrupted instruction(s) atomically accessing a value
This requires special hardware support
Simplifies construction of OS / parallel applications
What are the pros and cons of sequential consistency?
Pros:
- ensures program order and write atomicity
- intuitive and easy to use
Cons:
- No optimisations and bad performance
What are the pros and cons of relaxed consistency?
Pros:
- Enables more optimisations and better performance
- Wide variety of models offers maximum flexibility
Cons:
- Does not ensure program order
- Added complexity for programmers and compilers
What are atomic operations?
Special instructions that are used to guarantee execution semantics. This means that they differ from normal instructions by having this added guarantee about what is happening in the system.
Guarantee execution without interference from other programs.
What are two examples of atomic operations?
Load-link / store-conditional
Atomic swap
What is Load-link / Store-conditional
Used in a sequence: first LL then SC
If memory location accessed by LL is written to, SC fails
If contect switch between LL and SC, SC fails
How is LL / SC implemented?
Using a special link register.
This register contains the address used in LL.
This is reset (to zero) if matching cache block is invalidated or if we get an interrupt.
SC checks if the link register contains the same address. If so, we have atomic execution of LL & SC
The store is conditional, on the link load not being messed with
What is atomic exchange (swap)?
Swaps value in register for value in memory
If mem = 0, not locked
Mem = 1, locked
Sets the register to 1 -> means the processor wants the lock.
Then perform an exchange operation:
Exchange(register, mem)
If register ends up being 0 we have a success. Mem was 0 and is now 1.
If register ends up being 1 we failed. Mem was 1 and was locked. Still locked because mem is still 1.
The exchange must always be atomic.