Lecture 10: Hardware Support for Synchronisation, Lecture 11: Load-Linked and Store-Conditional Flashcards
Shared-memory programming requires synchronisation mechanisms to protect shared data. Their implementation usually requires hardware support. Give an example.
A lock called binary semaphore (in a processor with a snoopy cache)
Can be held by at most 1 thread
Waiting threads use busy-waiting
What is a binary semaphore? Describe its operations.
It’s a single shared “boolean” variable S which value is used to protect a shared resource
1. S == 1 ➡ resource is free
2. S == 0 ➡ resource is in use
Semaphore operations (should be atomic)
wait(S): wait until S != 0 then set S = 0 (i.e. take the lock)
signal(S): set S = 1 (i.e. release the lock)
Define atomic
Once a thread starts to execute wait() it should first finish it before any other thread can start it, indivisible.
Requires special instructions to be supported in hardware: atomic instructions, with a compromise between complexity and performance. May need cache coherence operations.
Give an example of atomic instruction-level behaviour
Test-And-Set Instruction:
Cannot be interrupted
No other core can modify what is pointed by address in memory while the tas runs
Instruction-level behaviour is atomic.
True or false
In a semaphore(shared) operation with tas, processors are likely to end up with a copy of variable in their cache
True
What is the result of the following:
atomic read-modify-write instruction locks the access to memory from other processors
likely to be expensive
True or false
the processor must ‘locks’ the snoopy bus for every multiprocessor tas operation
True
If a thread has the lock, then another wanting it will sit in a loop continually executing a tas until the variable becomes free. All this time it will be wasting bus cycles and slowing down cache coherence traffic from other cores. How can this issue be addressed?
test-and-test-and-set:
Most of the time we busy wait with a standard ldr
Only once S is seen to be free, a (costly) tas is made
List and describe two read-modify-write operations other than tas
- fetch-and-add: returns the value of a memory location and increments it
- compare-and-swap: compare the value of a memory location with a value (in a register) and swap in another value (in a register) if they are equal
True or false
‘read-modify-write’ (RMW) do not need to lock the snoopy bus during their execution
False
True or false
RMW instructions are desirable with all CPU designs
False. Doesn’t fit well with simple RISC pipelines, where RMW is really a CISC instruction requiring a read, a test and a write
Atomic read-modify-write instructions can be used but they are inefficient in some situations. How are these issues addressed?
by breaking an atomic RMW operation into two instructions working together: load-linked and store-conditional
Describe load-linked and store-conditional
Slightly different from traditional loads and stores by additional effects on CPU state. Can act as a pair atomically without holding the bus until completion.
How do load-link and store-conditional work?
core keeps some state for the load-link and sets the load link flag, then store-conditional only succeeds if load linked flag set and load link flag is cleared
What other events can clear the load-linked flag?
context switches, interruptions