Multiprocessors Flashcards
What are multiprocessors
Multiprocessors refers to tightly coupled processors whose coordination and usage is controlled by a single operating system and that usually share memory through a shared address space
Multicores when all cores are in the same chip (named manycores when more than 32 cores)
Multiple Instruction Multiple Data (MIMD)
How to connect processors?
Single bus vs. interconnection network:
Single-bus approach imposes constraints on the number of processors connected to it (up to now, 36 is the largest number of processors connected in a commercial single bus system) => saturation.
To connect many processors with a
high bandwidth, the system needs to
use more than a single bus
=> introduction of an
interconnection network
What are the cost and performance tradeoffs between the ways to connect processors
The network-connected machine has a smaller initial cost, then the costs scale up more quickly than the bus-connected machine.
Performance for both machines scales linearly until the bus reaches its limit, then performance is flat no matter how many processors are used.
When these two effects are combined => the network-connected machine has consistent performance per unit cost, while the bus-connected machine has a ‘sweet spot’ plateau (8 to 16 processors).
Network-connected MPs have better cost/performance on the left of the plateau (because they are less expensive), and on the right of the plateau (because they have higher performance).
See picture 30
What are the different network topologies?
Single bus
Ring
Mesh
N-cube
Crossbar Network
What are the single-bus and ring network topologies
Single-bus: not capable of simultaneous transactions
Ring: capable of many simultaneous transfers (like a segmented bus). Some nodes are not directly connected => the communication between some nodes needs to pass through intermediate nodes to reach the final destination (multiple-hops).
See picture 31
How do we analyze the network performance and what are them for the single-bus and ring topologies
P = number of nodes
M = number of links
b = bandwidth of a single link
Total Network Bandwidth (best case): M x b number of links multiplied by the bandwidth of each link
For the single-bus topology, the total network bandwidth is just the bandwidth of the bus: (1 x b)
For the ring topology P = M and the total network bandwidth is P times the bandwidth of one link: (P x b)
Bisection Bandwidth (worst case): This is calculated by dividing the machine into two parts, each with half the nodes. Then you sum up the bandwidth of the links that cross that imaginary dividing line.
For the ring topology is two times the link bandwith: (2 x b),
For the single bus topology is just the bus bandwidth: (1 x b).
See picture 32
What is the crossbar network and its performances metrics?
Crossbar Network or fully connected network: every processor has a bidirectional dedicated communication link to every other processor
very high cost
Total Bandwidth: {(P x (P –1) )/ 2 } x b
Bisection Bandwidth: (P/2)2 x b
What is the bidimensional mesh and its performances metrics?
See picture 33
What is the hypercube and its performances metrics?
See picture 34
What are the possible memory space models
1) Single logically shared address space: A memory reference can be made by any processor to any memory location through loads/stores Shared Memory Architectures.
The address space is shared among processors: The same physical address on 2 processors refers to the same location in memory.
2) Multiple and private address spaces: The processors communicate among them through send/receive primitives => Message Passing Architectures.
The address space is logically disjoint and cannot be addressed by different processors: the same physical address on 2 processors refers to 2 different locations in 2 different memories.
How is the communication managed in shared addresses?
Implicit management of the communication through load/store operations to access any memory locations.
Shared memory model imposes the cache coherence problem among processors.
How is the communication managed in multiple private addresses?
The processors communicate among them through sending/receiving messages: message passing protocol
The memory of one processor cannot be accessed by another processor without the assistance of software protocols.
No cache coherence problem among processors
What are the possibilities for physical memory organization?
Centralized Memory:
UMA (Uniform Memory Access): The access time to a memory location is uniform for all the processors: no matter which processor requests it and no matter which word is asked.
Distributed Memory:
The physical memory is divided into memory modules distributed on each single processor.
NUMA (Non Uniform Memory Access): The access time to a memory location is non uniform for all the processors: it depends on the location of the data word in memory and the processor location.
What is the relation between address space and the physical memory organization
See picture 35
What is the problem of cache coherence?
When shared data are cached, the shared values may be replicated in multiple caches.
The use of multiple copies of same data introduces a new problem: cache coherence.
Multiple copies are not a problem when reading, but a processor must have exclusive access to write a word.