Multiprocessors Flashcards

1
Q

What are multiprocessors

A

Multiprocessors refers to tightly coupled processors whose coordination and usage is controlled by a single operating system and that usually share memory through a shared address space

 Multicores when all cores are in the same chip (named manycores when more than 32 cores)

 Multiple Instruction Multiple Data (MIMD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to connect processors?

A

Single bus vs. interconnection network:

 Single-bus approach imposes constraints on the number of processors connected to it (up to now, 36 is the largest number of processors connected in a commercial single bus system) => saturation.

 To connect many processors with a
high bandwidth, the system needs to
use more than a single bus
=> introduction of an
interconnection network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the cost and performance tradeoffs between the ways to connect processors

A

 The network-connected machine has a smaller initial cost, then the costs scale up more quickly than the bus-connected machine.

 Performance for both machines scales linearly until the bus reaches its limit, then performance is flat no matter how many processors are used.

 When these two effects are combined => the network-connected machine has consistent performance per unit cost, while the bus-connected machine has a ‘sweet spot’ plateau (8 to 16 processors).

 Network-connected MPs have better cost/performance on the left of the plateau (because they are less expensive), and on the right of the plateau (because they have higher performance).

See picture 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the different network topologies?

A

Single bus

Ring

Mesh

N-cube

Crossbar Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the single-bus and ring network topologies

A

Single-bus: not capable of simultaneous transactions

Ring: capable of many simultaneous transfers (like a segmented bus). Some nodes are not directly connected => the communication between some nodes needs to pass through intermediate nodes to reach the final destination (multiple-hops).

See picture 31

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we analyze the network performance and what are them for the single-bus and ring topologies

A

P = number of nodes
M = number of links
b = bandwidth of a single link

Total Network Bandwidth (best case): M x b number of links multiplied by the bandwidth of each link
 For the single-bus topology, the total network bandwidth is just the bandwidth of the bus: (1 x b)
 For the ring topology P = M and the total network bandwidth is P times the bandwidth of one link: (P x b)

Bisection Bandwidth (worst case): This is calculated by dividing the machine into two parts, each with half the nodes. Then you sum up the bandwidth of the links that cross that imaginary dividing line.
 For the ring topology is two times the link bandwith: (2 x b),
 For the single bus topology is just the bus bandwidth: (1 x b).

See picture 32

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the crossbar network and its performances metrics?

A

Crossbar Network or fully connected network: every processor has a bidirectional dedicated communication link to every other processor
 very high cost

Total Bandwidth: {(P x (P –1) )/ 2 } x b
Bisection Bandwidth: (P/2)2 x b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the bidimensional mesh and its performances metrics?

A

See picture 33

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the hypercube and its performances metrics?

A

See picture 34

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the possible memory space models

A

1) Single logically shared address space: A memory reference can be made by any processor to any memory location through loads/stores  Shared Memory Architectures.
 The address space is shared among processors: The same physical address on 2 processors refers to the same location in memory.

2) Multiple and private address spaces: The processors communicate among them through send/receive primitives => Message Passing Architectures.
 The address space is logically disjoint and cannot be addressed by different processors: the same physical address on 2 processors refers to 2 different locations in 2 different memories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is the communication managed in shared addresses?

A

Implicit management of the communication through load/store operations to access any memory locations.

Shared memory model imposes the cache coherence problem among processors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the communication managed in multiple private addresses?

A

The processors communicate among them through sending/receiving messages: message passing protocol

The memory of one processor cannot be accessed by another processor without the assistance of software protocols.

No cache coherence problem among processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the possibilities for physical memory organization?

A

Centralized Memory:
 UMA (Uniform Memory Access): The access time to a memory location is uniform for all the processors: no matter which processor requests it and no matter which word is asked.

Distributed Memory:
The physical memory is divided into memory modules distributed on each single processor.
 NUMA (Non Uniform Memory Access): The access time to a memory location is non uniform for all the processors: it depends on the location of the data word in memory and the processor location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the relation between address space and the physical memory organization

A

See picture 35

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the problem of cache coherence?

A

When shared data are cached, the shared values may be replicated in multiple caches.

The use of multiple copies of same data introduces a new problem: cache coherence.

Multiple copies are not a problem when reading, but a processor must have exclusive access to write a word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the solution of the cache coherence problem

A

HW-based solutions to maintain coherence: Cache-Coherence Protocols

Key issues to implement a cache coherent protocol in multiprocessors is tracking the state of any sharing of a data block.

Two classes of protocols:
Ø Snooping Protocols
Ø Directory-Based Protocols

17
Q

What is the snooping protocol?

A

All cache controllers monitor (snoop) on the bus to determine whether or not they have a copy of the block requested on the bus and respond accordingly.

Every cache that has a copy of the shared block, also has a copy of the sharing state of the block, and no centralized state is kept.

Send all requests for shared data to all processors.

18
Q

Why are the types of snooping protocols

A

Ø Write-Invalidate Protocol

Ø Write-Update (or Write-Broadcast) Protocol

19
Q

What is the write-invalidate protocol

A

The writing processor issues an invalidation signal over the bus to cause all copies in other caches to be invalidated before changing its local copy.

This scheme uses the bus only on the first write to invalidate the other copies.

20
Q

What is the write-update protocol

A

The writing processor broadcasts the new data over the bus; all caches check if they have a copy of the data and, if so, all copies are updated with the new value.

21
Q

What is the combination of horizontal and vertical cache memory coherence generally used?

A

Write invalidate + write back

Write update + write through

22
Q

What is MSI and what are the stages it has

A

It is a Write-Invalidate Snooping Protocol, Write-Back Cache

Each cache block can be in one of three states:
§ Modified (or Dirty) : cache has only copy, its writeable, and dirty (block cannot be shared anymore)
§ Shared (or Clean) (read only): the block is clean (not modified) and can be read
§ Invalid : block contains no valid data

Each block of memory is in one of three states:
§ Shared in all caches and up-to-date in memory (Clean)
§ Modified in exactly one cache (Dirty)
§ Uncached when not in any caches

23
Q

What are the possible consequences of a read miss on a cache block?

A

IF (other caches have the block TAG SHARED) => blocks stay SHARED;

IF (another cache has TAG MODIF.) =>
WRITE-BACK block in memory;
block SHARED in previous cache;
LOAD from mem in cache that requested as SHARED

24
Q

What are the possible consequences of a write miss in a cache block?

A

IF (other caches have the requested block TAG SHARED) => blocks INVALIDATE in others;

IF (another cache has the requested block TAG MODIF.) =>
WRITE-BACK block TAG in mem;
block INVALIDATE in previous cache;
LOAD block TAG from mem in cache that requested the block as MODIF;
WRITE in the requested cache;

25
Q

In case we have a given block tagged in the cache (shared flag), what are the possible operations?

A

Read hit: cache block unchanged

Read miss: place the read miss on the new blocks tag on the bus

Write hit: send a write invalidate on the bus for the tag block in cache

Write miss: place a write miss on the new desired tag on the bus

26
Q

In case we have a given block tagged in the cache (modified flag), what are the possible operations?

A

Read Hit; Cache Block Unchanged

Read Miss; Write Back current Block on the cache. Place Read Miss on the new tag on Bus

Write Hit; Cache Block Unchanged

Write Miss; Write Back current Block on the cache. Place write Miss on the new tag on Bus

27
Q

What is the MESI and the states it can present

A

MESI Protocol: Write-Invalidate

Each cache block can be in one of four states:
§ Modified : the block is dirty and cannot be shared; cache has only copy, its writeable.
§ Exclusive: the block is clean and cache has only copy;
§ Shared: the block is clean and other copies of the block are in cache;
§ Invalid: block contains no valid data

28
Q

Why is the main difference between MSI and MESI

A

A write to an Exclusive block does not require to send the invalidation signal on the bus, since no other copies of the block are in other caches.

29
Q

What is a directory based protocol

A

The sharing state of a block of physical memory is kept in just one location, called directory.
- In Centralized Shared-Memory architectures (such as Symmetric Multiprocessors), there is a single directory associated to the main memory.
- For Distributed Shared-Memory architectures, the directory is distributed on the nodes (one directory for each memory module) to avoid bottlenecks.

To avoid broadcast, send point-to-point requests to processors => Message Passing Protocol

30
Q

What is the content of a directory in a directory-based protocol

A

The directory maintains info regarding:
Ø The coherence state of each block
Ø Which processor(s) has(have) a copy of the block (usually bit vector, 1 if processor has copy).
Ø Which processor is the owner of the block (when the block is in the exclusive state, 1 if processor is owner).

31
Q

What are the Three possible coherence states for each cache block in the directory

A

Three possible coherence states for each cache block in the directory:
Ø Uncached: no processor has a copy of the cache block; block not valid in any cache;
Ø Shared: one or more processors have cache block, and memory is up-to-date; sharer set of proc. IDs;
Ø Modified: only one processor (the owner) has data that has been modified so the memory is out-of-date

32
Q

How does the directory-based protocol promotes communication between nodes

A

Message-oriented protocol: The requests generate messages sent between nodes (point-to-point requests) to maintain coherence and all messages must receive explicit answers.

33
Q

What are the possible coherence states in the cache in the directory protocol

A

Each block in the cache can be in three possible coherence states (such as in MSI snooping protocols):
Ø Modified (read/write): the block is dirty and this processor is the only owner;
Ø Shared (read only): the block is valid/up-to-date (clean) and it is shared with other processors.
Ø Invalid: block contains no valid data

34
Q

How can the classification of each node relate to each other?

A

The L node can be the H node and vice-versa (if the L node is equal to the H node we can use intra-node transactions instead of inter-node messages based on the same protocol).

The R node can be the H node and vice-versa

The R node and the L node are different by definition.

35
Q

What are the possible messages types in the directory-based protocol

A

See picture 36

36
Q

What are the possible messages to a uncached block in a directory-based protocol

A

Read Miss from local cache (ex. N1): Requested data are sent by Data Value Reply from home memory to local cache and requestor is made the only sharing node. The state of the block is made S.

Write Miss from local cache (ex. N1): Requested data are sent by Data Value Reply from home memory to local cache and requestor becomes the owner node. The block is made M to indicate that the only valid copy is cached. Sharer bits indicate the identity of the owner of the block.

37
Q

What are the possible messages to a shared block in a directory-based protocol

A

Read Miss from local cache: Requested data are sent by Data Value Reply from home memory to local cache and the requestor is added to the Sharer bits. The state of the block stays S.

Write Miss from local cache: Requested data are sent by Data Value Reply from home memory to local cache. Invalidate messages are sent from home to remote sharer(s) and bits are set to the identity of requestor (owner). The state of the block becomes M.

38
Q

What are the possible messages to a modified block in a directory-based protocol

A

Read Miss from local cache: To the owner node is sent a Fetch message, causing state of the block in the owner’s cache to transition to S and the owner send data to the home directory (through Data Write Back); data written to home memory are sent to requesting cache by Data Value Reply. The identity of the requesting processor is added to the Sharers set, which still contains the identity of the processor that was the owner (since it still has a readable copy). Block state in directory is set to S.

Data Write-Back from remote owner: The owner cache is replacing the block and therefore must write it back to home. This make the memory copy up to date (the home dir. becomes the owner), the block becomes U, and the Sharer set is empty.

Write Miss local cache: A Fetch/Inv. msg is sent to the old owner causing to invalidate the cache block and the cache to send data to the home directory (Data Write Back), from which the data are sent to the requesting node (Data Value Reply), which becomes the new owner. Sharer is set to the identity of the new owner, and the state of the block remain M (but owner changed)