Lectures 12 & 13 - Chip Multiprocessors Flashcards
Flynn’s Taxonomy
SISD - Uniprocessor
SIMD - Vector
MISD - Not interesting?
MIMD - Each processor runs its own program and operates on its own data.
Shared-address-space platforms
All processors have access to a shared data space accessed via a shared address space. All communication takes place via a shared memory. Each may also have area of memeory that is private.
Message-passing platforms
Each processing element has its own exclusive address space and communication is achieved by sending explicit messages between processing elements. Sending and receiving of message is used to communicate between and synchronise the actions of multiple processing elements.
Why uniprocessors were difficult to develop further?
Exploiting greater levels of ILP became very expensive (transistor count, complexity and power).
Limits to pipelining
Designs limited by power consumption
Interconnects scale poorly compared to transistors.
Processor complexity limited by cost of design/verification and time to market constraints.
Bus Snooping
Exploit presence of shared bus:
Bus access is arbitrated ensuring each bus transaction is completed before next transaction starts. All bus transactions are broadcast and can be observed by all processors (in same order). Coherence can be maintained by ensuring all cache controllers “snoop” on the bus and monitor the transactions. Cache controller may take action if bus transaction involves memory block of which it has a copy.
Write-Through Invalidation Protocol
Every write causes a write transaction on the bus.
For each write transaction:
Each snooping cache checks if it has a copy of the cache block associated with the write address.
If we have a copy of block invalidate it.
Sequential Consistency
A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor occur in this sequence in the order specified by its program.
Relaxed Consistency
Reordering memory operations between synchronization operations does not typically affect correctness.
False sharing
Cache coherence mechanism may cause block to be invalidated even when no communication is taking place. Unrelated words just happen to be stored in same unit of coherence.
Private L2 Caches
Low hit latency but low capacity. Cache capacity for shared data is reduced as each private L2 must keep its own copy of the shared data. Constrained by fixed partitioning of cache resources.
Cache Exclusion Policy
Block in L1 or L2 so need independent snooping hardware for each level of the cache hierarchy.
Cache Inclusion Policy
If block is in L1 then it is also in the L2. Sufficient to just snoop L2.
Shared L2 Cache
Greater capacity but higher hit latency. No replication of data when data is shared between cores. Shared L2 cache has larger capacity which helps reduce capacity misses. Don’t need to worry about cache coherency at L2 level. Avg. hit latency will be higher as need to retrieve data from remote L2 back. Need greater associativity to control conflict miss rate.
MSI - Cache Line States
Shared - Block is present in unmodified state in this cache and main memory is up-to-date. Copies may also exist in other caches.
Modified - Only this cache has a valid copy and copy in memory is stale.
Invalid - Explanatory
MESI Protocol
Add exclusive state if only read by one person. This allows silent transition to M, if read and write by same user.