Multicore and cache coherency Flashcards
What are 2 types of coherency protocols?
Snooping coherence protocols
Directory base coherency protocols
What is cache coherency?
How to keep the memory coherent across the different caches in a system.
How memory updates are propagated through the system.
What are some trends that occured in uniprocessor design, that motivates to the use of multicore? (3)
Single core designs started to become very complex. More difficult to verify
Speed of light have some limits on the wire length, and how far signals are able to travel in a cycle. Larger cores requires signals to travel farther
Diminishing returns from ILP, difficult to extract more ILP from a single thread
What are some advantages of a multiprocessor design? (4)
Increased performance through a different type of parallelism (task-based, thread-based instead of ILP)
Multichip: Put multiple CPUs into the same machine.
Multicore: Put multiple cores on one chip
Can keep the design of one core quite simple/smaller, and instead replicate this across multiple cores. These must be connected so do have some new complexity
How does the demand in server vs. desktop performance motivate multicore design?
More and more cloud computing, less need to have higher performance on personal computers.
Graphics performance is off-loaded to GPUs.
Servers are able to have a lot of TLP
How does technology issues motivate multicore designs?
Increasing the complexity of a single core, gives more problems with power and cooling.
Having multiple cores, with lower frequency, allows us to keep the throughput while lowering the power.
What are the two types of memory hierarchy structures used in multicore systems?
Centralized memory
Distributed memory
What is centralized memory?
Uniform memory access (UMA) - only one physical memory
Each core has their own L1 cache
L2 caches are shared between , or sets of, cores.
L3 and main memory is shared for all cores.
Constant latency between the memory layers. Constant latency between L1 and L2, L2 and L3, and so on.
What are a pro and con with having constant latencies between memory layers in centralized memory?
Pro:
Know that every load and store will take the same amount of time, don’t need to optimize this by the programmer.
Con:
All accesses to main memory will travel on the bus between L3 and Main memory. More difficult to scale, as all traffic goes to one point
What is distributed memory?
Each core has a L1.
2 cores, or set of cores, share L2
L2 are connected to a network that distributes accesses to different banks of L3 and possibly a divided set of pools of main memory
Can both have non-uniform (NUMA) and uniform (UMA) memory accesses:
Depending on what memory bank is distributed to your core, the latency can vary based on how close or far away it is. Also have many memory controllers, and the distance between these and the cores can vary.
What is a pro and con with distributed memory?
Pro:
Distributed accesses, less congestion on an individual resource.
Better scaling by using physically seperate memories.
Con:
Network becomes more complex.
What type of address spaces does distributed memory have?
Can both have shared and several separate address spaces.
Shared: Supports shared memory and message passing programming model
Separate: Support only message passing programmin model. Multiple cores on the same die. Easier to scale across devices (separate server, cloud, etc.)
What programming models works for shared memory?
pthreads, OpenMP
Data exchange between threads happens via memory (ld, sw, atomic). In memory synchronization primitives used to coordinate access (locks, semaphores, etc.)
Main thing to take care of is synchronizing the threads.
What programming models works for distributed memory?
MPI
Common in supercomputers, or systems with multiple servers.
Cannot access data of another core.
Data is exchanged via messages (send, recv)
Synch.primitives implemented via messages (barriers)
What is the (bus) snooping cache coherency protocol?
Each cache maintains local status
All caches monitor a broadcast medium, an interconnect network that is seen by all the caches.
Have a write invalidate and write update protocol. Each cache are keeping track of all updates happening in the system