Parallel Architecture Flashcards
Hypercube bisection width
p/2
Square Toirodal Mesh Bisection Width
2*sqrt(p)
Mesh Bisection Width
Min(n,m)
Fully Connected Bisection Width
p^2/4
Crossbar Bisection Width
p
Omega Bisection Width
p/2
Message transimission time
l (latency) + n (bytes)/b (bytes/second)
Snooping Cache Coherence
The idea behind snooping comes from bus-based systems: When the cores share a bus, any signal transmitted on the bus can be “seen” by all the cores connected to the bus. Thus when core 0 updates the copy of x stored in its cache, if it also broadcasts this information across the bus, and if core 1 is “snooping” the bus, it will see that x has been updated, and it can mark its copy of x as invalid. This is more or less how snooping cache coherence works.
Directory Based Cache Coherence
- Directory-based cache coherence protocols attempt to solve this problem through the use of a data structure called a directory.
- The directory stores the status of each cache line.
- Typically, this data structure is distributed; in our example, each core/memory pair might be responsible for storing the part of the structure that specifies the status of the cache lines in its local memory.
- Thus when a line is read into, say, core 0’s cache, the directory entry corresponding to that line would be updated, indicating that core 0 has a copy of the line.
- When a variable is updated, the directory is consulted, and the cache controllers of the cores that have that variable’s cache line in their caches will invalidate those lines.
Shared Memory vs Distributed Memory
Shared Memory:
Pros:
1. Implicit coordination of processors through shared data structures.
2. Appealing programming model for many programmers.
3. Generally suitable for systems with a small number of processors.
Cons:
1. Scaling interconnect can be costly.
2. Conflicts over access to the bus increase dramatically with more processors.
3. Large crossbars, while efficient, are expensive.
Distributed Memory:
Pros:
1. Relatively inexpensive interconnects like hypercube and toroidal mesh.
2. Well-suited for systems with thousands of processors.
3. Better for problems requiring vast amounts of data or computation.
Cons:
1. Requires explicit message passing for coordination.
2. More complex programming model for many programmers.
3. Not as suitable for small-scale systems with few processors.
MIMD
- MIMD (Multiple Instruction, Multiple Data) systems support multiple simultaneous instruction streams operating on multiple data streams.
- MIMD systems consist of fully independent processing units or cores, each with its own control unit and datapath.
- MIMD systems are asynchronous, meaning processors can operate at their own pace.
- Many MIMD systems lack a global clock and may have no relation between system times on different processors.
- Without synchronization imposed by the programmer, even if processors execute the same sequence of instructions, they may execute different statements at any given instant.
SIMD
- SIMD (Single Instruction, Multiple Data) systems operate on multiple data streams by applying the same instruction to multiple data items simultaneously.
- Abstract SIMD systems have a single control unit and multiple datapaths.
- Instructions are broadcast from the control unit to the datapaths, where each datapath applies the instruction to a data item or remains idle.
- An example application is “vector addition,” where two arrays with n elements each are added element-wise.
Modified State
- cache block contains current value
- all other copies contains an invalid value
Shared State
- cache block has not been updated
- cache block contains current value
- all other copies also contains the current value
Invalid State
- cache block does not contain most recent value of
memory block