Multicore Processors (2) Flashcards

1
Q

What is memory consistency?

A

What ordering do we see between reads and writes from another core that uses the same shared memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Look at consistency example

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Look at reordering memory ops example

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why might loads and stores to different addresses get reordered?

A

To allow higher performance from core

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can reordering memory ops lead to?

A

Non-intuitive situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does coherence ensure?

A

That cached data written by a core is seen by others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the memory consistency model determine?

A

The order that operations from one core can be seen by others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is relaxed consistency? When do we want to use this?

A

Some loads and stores can bypass each other
Relative ordering between operations to the same address is maintained
Want relaxed consistency processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of consistency do we want for shared data?

A

Sequential (strong) consistency
All reads and writes by a single processor are seen in the order they occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe a memory barrier

A

Guarantee ordering of memory operations within a core - used to force sequential consistency
All prior memory operations complete before the barrier finishes execution ie. store after a barrier can’t overtake a load before it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Look at example of memory barrier

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are atomic operations?

A

Uninterruptible sequences of operations that appear to all occur as one
Used to create software synchronisation primitives eg. locks, thread barriers etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Read-Modify-Write (RMW)

A

Most basic class of atomic operation
Provide ability to read a memory location and simultaneously write a new value back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give 2 examples of RMW

A
  1. Atomic exchange
  2. Fetch and add
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Draw a diagram of atomic exchange (between X in cache and R1 in core)

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is atomic exchange? Why is it difficult in RISC machines?

A

Read and write in one interruptible instruction
But would require a complex instruction and in RISC we prefer simple load or store instructions

17
Q

What can we use instead of atomic exchange?

A

Load reserved / store conditional - instead of one instruction, provide two halves

18
Q

Describe load reserved / store conditional

A

Instructions are linked together: store only succeeds if X hasn’t changed, any write to X causes the store to fail
Source register contains 1 on success, 0 on failure

19
Q

Write the RISC instruction for atomic exchange between X in cache and R1 in core

A

atomic_exch X, r1

20
Q

Write the RISC instructions for load reserved / store conditional between X in cache and R1 in core

A

load_reserved r0, X
store_conditional r1, X

21
Q

Write out the instruction sequence for atomic exchange

A

xchg: mov r3, a0
lr r4, 0(r1)
sc r3, 0(r1)
beqz r3, xchg (fail and branch back to top of loop)
mov a0, r4

22
Q

Write out the instruction sequence for fetch-and-add

A

fadd: lr r4, 0(r1)
add r4, r4, 1
sc r4, 0(r1)
beqz r4, fadd

23
Q

Write out the instruction sequence for a more optimised spin lock

A

lock: lr r4, 0(r1)
bneq r4, lock
mov r3, #1
sc r3, 0(r1)
beqz r3, lock

24
Q

Write out the instruction sequence for a simple spin lock

A

lock: mov r3, #1 (take lock)
lr r4, 0(r1)
sc r3, 0(r1)
beqz r3, lock (check if store conditional was successful)
bneq r4, lock (if lock in r4 contains 1 staret again because lock has been taken)

25
Give the advantage and disadvantage of the code for a more optimised spin lock
+ More efficient because checking if lock has been taken before doing atomic exchange, prevents unnecessary writes - Instructions between lr and sc so increased likelihood of conditional failing
26
Look over naive spin lock example
.
27
What is the disadvantage of the naive spin lock?
Causes lots of bus traffic - lots of cache coherence work due to attempted writes
28
Look over spin lock with local caching example
.
29
Why might we add memory barriers to spin locks?
So that nothing can be reordered against the lock ie. otherwise locked values could leak into memory before all cores realised they are locked
30
Write out the instruction sequence for locking a spin lock with barrier
lock: lr r4, 0(r1) bneq r4, lock mov r3, #1 sc r3, 0(r1) beqz r3, lock membar (so no backwards propagation of values that have been changed)
31
Write out the instruction sequence for unlocking a spin lock with barrier
unlock: membar (make sure all operations protected by the lock are finished before we unlock it) st zero, 0(r1)
32
In which 2 ways can a load reserved be implemented?
1. Could bring data into the cache in S state Now store conditional must check state hasn't changed and issue BusRdX to allow modification of data 2. Bring data in M state on the load reserved Now store conditional just needs to check that state hasn't changed, but causes contention if 2 or more cores wanting to lock
33
Look at original example with locks
.
34
Why do we need to consider TLB coherence?
Updates (eg. from OS) to page tables can result in stale TLB entries
35
What are the 2 approaches to TLB coherence?
1. TLB shoot-downs: OS flushes TLB entries on every core Relies on inter-processor interrupts to trigger updates on every core 2. Hardware keeps TLB coherent with PTEs
36
What is a disadvantage of TLB shoot-downs?
Expensive - every core gets involved and causes lots of PT lookups as TLB repopulates
37
What is a disadvantage of using hardware for TLB coherence?
Adds complexity to hardware