Lecture 10: Hardware Support for Synchronisation, Lecture 11: Load-Linked and Store-Conditional Flashcards

Question 1

Q

Shared-memory programming requires synchronisation mechanisms to protect shared data. Their implementation usually requires hardware support. Give an example.

Answer

A

A lock called binary semaphore (in a processor with a snoopy cache)
Can be held by at most 1 thread
Waiting threads use busy-waiting

Question 2

Q

What is a binary semaphore? Describe its operations.

Answer

A

It’s a single shared “boolean” variable S which value is used to protect a shared resource
1. S == 1 ➡ resource is free
2. S == 0 ➡ resource is in use

Semaphore operations (should be atomic)
wait(S): wait until S != 0 then set S = 0 (i.e. take the lock)
signal(S): set S = 1 (i.e. release the lock)

Question 3

Q

Define atomic

Answer

A

Once a thread starts to execute wait() it should first finish it before any other thread can start it, indivisible.

Requires special instructions to be supported in hardware: atomic instructions, with a compromise between complexity and performance. May need cache coherence operations.

Question 4

Q

Give an example of atomic instruction-level behaviour

Answer

A

Test-And-Set Instruction:
Cannot be interrupted
No other core can modify what is pointed by address in memory while the tas runs

Instruction-level behaviour is atomic.

Question 5

Q

True or false

In a semaphore(shared) operation with tas, processors are likely to end up with a copy of variable in their cache

Question 6

Q

What is the result of the following:

atomic read-modify-write instruction locks the access to memory from other processors

Answer

A

likely to be expensive

Question 7

Q

True or false

the processor must ‘locks’ the snoopy bus for every multiprocessor tas operation

Question 8

Q

If a thread has the lock, then another wanting it will sit in a loop continually executing a tas until the variable becomes free. All this time it will be wasting bus cycles and slowing down cache coherence traffic from other cores. How can this issue be addressed?

Answer

A

test-and-test-and-set:
Most of the time we busy wait with a standard ldr
Only once S is seen to be free, a (costly) tas is made

Question 9

Q

List and describe two read-modify-write operations other than tas

Answer

A

fetch-and-add: returns the value of a memory location and increments it
compare-and-swap: compare the value of a memory location with a value (in a register) and swap in another value (in a register) if they are equal

Question 10

Q

True or false

‘read-modify-write’ (RMW) do not need to lock the snoopy bus during their execution

Question 11

Q

True or false

RMW instructions are desirable with all CPU designs

Answer

A

False. Doesn’t fit well with simple RISC pipelines, where RMW is really a CISC instruction requiring a read, a test and a write

Question 12

Q

Atomic read-modify-write instructions can be used but they are inefficient in some situations. How are these issues addressed?

Answer

A

by breaking an atomic RMW operation into two instructions working together: load-linked and store-conditional

Question 13

Q

Describe load-linked and store-conditional

Answer

A

Slightly different from traditional loads and stores by additional effects on CPU state. Can act as a pair atomically without holding the bus until completion.

Question 14

Q

How do load-link and store-conditional work?

Answer

A

core keeps some state for the load-link and sets the load link flag, then store-conditional only succeeds if load linked flag set and load link flag is cleared

Question 15

Q

What other events can clear the load-linked flag?

Answer

A

context switches, interruptions

Question 16

Q

True or false?

a ll/sc pair allows you to ensure atomic execution with respect to the locked address

Answer

Study These Flashcards

A

True

Question 17

Q

All our implementations of wait(S) use ___.

Answer

Study These Flashcards

A

a spinlock, busy loop

Hurt performance when contended, Sleeping would be more efficient, mutex

Lecture 10: Hardware Support for Synchronisation, Lecture 11: Load-Linked and Store-Conditional Flashcards

(17 cards)