P4L3: Distributed Shared Memory Flashcards by Robert Lindgren

What are four granularities at which we can share state?

Cache line
Variable
Page
Object

How well did you know this?

Not at all

Perfectly

What are the tradeoffs associated with sharing at the granularity of the cache line?

Sharing at the granularity of the cache line is too fine-grained.
The amount of coherence traffic outweighs any consistency benefits

How well did you know this?

Not at all

Perfectly

What are the tradeoffs associated with sharing at the granularity of the variable?

PROS
This level of granularity makes it possible for programmers to specify sharing semantics on a per-variable basis

CONS
This level of sharing is still too fine-grained, and the level of network overhead will still be too high.

How well did you know this?

Not at all

Perfectly

What are the tradeoffs associated with sharing at the granularity of the page?

PROS

It’s a viable option because it does not generate the coherence traffic of cache line or variable-level sharing
It also makes sense to the OS! So it’s readily generalizable

CONS
- Like any larger granularity, false sharing is a potential issue (which can occur when two processes are concurrently accessing different portions of the same page. In this case, the coherence traffic that gets generated is unnecessary)

How well did you know this?

Not at all

Perfectly

What are the tradeoffs associated with sharing at the granularity of the object?

PROS

We avoid the coherence traffic of cache line or variable-level sharing
The OS doesn’t need to be modified

CONS

The OS does not understands objects, so this requires a specific language runtime.
This makes object granularity a less generalizable solution than page granularity.

How well did you know this?

Not at all

Perfectly

For distributed state management systems (think distributed shared memory) to maintain consistency, what abilities must it have?

when a node requests data it gets a relatively recent copy of that data.
we broadcast when state has changed.

How well did you know this?

Not at all

Perfectly

Why do we differentiate between a global index structure to find the home nodes and local index structure about the portion of state they are responsible for?

The global index structure helps nodes to always find the home node for an address/page, which can ensure that a node can immediately get the most recent value for an object
The local index structures maintained by a home node are necessary to drive coherence mechanisms that are directed only at affected nodes

How well did you know this?

Not at all

Perfectly

Do you have some ideas about how you would go about implementing a distributed shared memory system?

The OS should be involved when we try to access shared memory, but not when we try to access local memory.

We can use the memory management unit (MMU) for this. When we try to access a remote address locally, it will be an invalid memory references. This will generate a fault and trap to the OS.
The OS will then detect that the memory address is remote and use the global map to look up the home node for the requested address.
The OS will message that node via IPC and request the data at the address.
When the data is received, the OS can cache it on the CPU and return it to the process that requested it.

We also need the OS involved when a process tries to write a piece of shared data, but not local data.

To accomplish this we write-protect virtual addresses that correspond to shared state.
Writing to these addresses will cause a fault, which traps to the OS.
The OS will see that the access points to shared data
If the requesting node is not the home node, the OS will send a message to the home node asking for that state in order to update it
If the requesting node is the home node, the OS will update the data and broadcast coherence messages to nodes that also store that data.
It can determine which nodes hold the changing data by maintaining per page data structures that contain a list of nodes that have accessed that page.
This means that when a node requests a page or an address, it should send its node ID as part of the request.

How well did you know this?

Not at all

Perfectly

What’s a consistency model?

A consistency model is an agreement between state (memory for example) and upper software layers.

It guarantees that state changes will be made visible to upper-level applications if those applications follow certain behaviors.

How well did you know this?

Not at all

Perfectly

What is a strict consistency model?

A strict consistency model guarantees:

All updates are made available everywhere, immediately.
Every node in the system will see all writes in the same order.

This strategy is not possible in practice. SMPs do not even offer this guarantee on single nodes. The added latency and message reordering/loss makes this even harder.

How well did you know this?

Not at all

Perfectly

What is a sequential consistency model?

This is next best to the strict consistency model.

Updates from different processors can be arbitrarily interleaved (ordered) so long as the ordering would be possible on a single processor system.
All processes see the same interleaving (ordering)!

EXTRAS

Updates are not required to be immediately visible.
Updates from the same process must maintain their ordering.
Concurrent reads will see the same value

How well did you know this?

Not at all

Perfectly

What is a causal consistency model?

This is a little less strict than sequential consistency.

Causal consistency detects causally related writes and ensures that they maintain their order.
Loosens the sequential requirement that all observed orderings are the same.

EXTRA:

Updates from the same node cannot be arbitrarily interleaved, just like sequential model
But NO guarantee about concurrent writes.

How well did you know this?

Not at all

Perfectly

What is a weak consistency model?

Instead of inferring causal relationships on its own, a weak consistency model makes a new operation available to the upper software layers: synchronization points.

How well did you know this?

Not at all

Perfectly

What are the four consistency models discussed in this course?

strict
sequential
causal
weak

How well did you know this?

Not at all

Perfectly

What is Distributed Shared Memory?

Distributed shared memory is a service that manages memory across multiple nodes so that applications will have the illusion that they are running on a single shared-memory machine.

How well did you know this?

Not at all

Perfectly

Why is Distributed Shared Memory important?

Distributed shared memory mechanisms are important because they permit scaling beyond the limitations of how much memory we can include in a single machine.

Single machines with large amounts of memory can cost hundreds of thousands of dollars per machine. In order to scale up memory affordably, it’s imperative to understand DSM concepts and semantics so many cheap machines can be connected to give the illusion of a high memory “single” machine.

How well did you know this?

Not at all

Perfectly

Describe hardware DSM

Study These Flashcards

Hardware-supported DSM relies on some physical interconnect. The OS running on each physical node is under the impression that it has access to much larger physical memory, and is allowed to establish virtual to physical mappings that point to physical addresses on other nodes.

Memory accesses that reference remote memory locations are passed to the network interconnect card, which translates remote memory accesses into interconnect messages that are then passed to the correct remote node.

The NICs are involved in all aspects of the memory management, access and consistency and even support some atomics.

While it’s very convenient to rely on hardware to do everything, this type of hardware is typically very expensive and as a result is reserved for very high-end machines.

What four things does a software DSM system have to do?

Study These Flashcards

DSM is often realized in software. The software will have to

detect local vs remote memory accesses
create and send messages to the appropriate node
accept messages from other nodes and perform the encoded memory operations
be involved in memory sharing and consistency support

What is a cache line?

Study These Flashcards

The smallest unit of memory that can be transferred between the main memory and the cache.

Rather than reading a single word or byte from main memory at a time, each cache entry is usually holds a certain number of words, known as a “cache line” or “cache block” and a whole line is read and cached at once. This takes advantage of the principle of locality of reference.

What is false sharing?

Study These Flashcards

False sharing occurs when a process accesses data that is not being altered by another process, but shares a cache block with it (like a page, or an object). This triggers coherence mechanisms even though they are unnecessary.

Consider a page that internally has two variables, x and y. A process on one node is exclusively accessing and modifying x. Similarly, a process on another node is exclusively accessing and modifying y. When x and y are on the same page, the DSM system will interpret the two write accesses as an indication of concurrent access to a shared page. This will trigger coherence mechanisms which, while logically viable, are functionally superfluous.

How do the basic cloud computing service models differ?

Study These Flashcards

The offerings differ primarily along the axis of ownership, with cloud providers owning different portions of an application stack for different models.

How do we find a particular page in a DSM system?

Study These Flashcards

First we check the global map for the manager node, then check the manager node for page (object) metadata.

GLOBAL MAP (replicated across nodes)

Each page (object) has an address = node ID + page frame number.
Node ID = ID for home/manager node. This node knows everything about this page.
This is captured in a Global Map (maps page address to manager node ID)
This map must be available on every node.

LOCAL PAGE METADATA (partitioned across nodes)
- Each manager node has all info for the page (object) that it manages

What do we do if we want more flexibility from our global map?

Study These Flashcards

The global map uses the page address to find the manager node. If we want to change the manager, we have to change the page address!

Instead, we can use a Global Mapping Table. This uses the object (page) ID to index into a table that returns the manager node. So instead changing the object address, we can just edit the table.

How does the sequential consistency model treat operations from the same process?

Study These Flashcards

They must maintain their original ordering

Under the sequential consistency model, are updates required to be immediately visible?

How does the sequential consistency model treat concurrent reads?

Concurrent reads will see the same value

How does the causal consistency model treat operations from the same process?

They must maintain their original ordering

How does the causal consistency model treat concurrent reads?

It makes no guarantees!

How do we make sure that when a node requests data, it gets a relatively recent copy of that data?

Every node must maintain a map that connects an address/page to a specific home/manager node in the system. The requesting node will contact the home node in order to request the data that that node manages. The requesting node is then free to cache that data until a coherence request is sent out.

How do we make sure that we broadcast a change in state?

The home node responsible for the changed state must maintain a per-page index of all of the nodes that have requested that page in the past. This allows it to contact all nodes that have cached that subset of state (like a page).

What is a global index structure (global map)?

The global index structure helps nodes to always find the home node for an address/page, which can ensure that a node can immediately get the most recent value for an object.

What is a local index structure necessary for?

The local index structures maintained by a home node are necessary to drive coherence mechanisms that are directed only at affected nodes.

What operations are available under the strict, sequential, and causal consistency models? What new operation is made available under weak consistency?

- Read and write is available in all of them | - Under weak, memory system makes synchronization points available

What does a synchronization point do?

- When P1 synchronizes, it makes all updates from other processes available to P1 - It also makes all updates from P1 available to other processes - BUT, updates are not immediately seen by other processes. They have to sync first to see them.

What are the three variations on weak consistency?

- Basic: Single sync operation syncs everything in entire shared memory - Separate sync per subset of state (like page) - Separate entry/acquire and exit/release operations (so two ops instead of one)

What are the pros/cons of offering separate sync operations (entry/acquire and exit/release)?

PROS - Limit data movement and number of coherence operations CONS - The shared memory layer must maintain additional state to enable these operations

What are the responsibilities of each node in a shared memory system?

- Each node owns some portion of the physical memory, and provides the operations (reads/writes) on that memory. - Each node needs to be involved in some consistency protocols to ensure that shared accesses to the state have meaningful semantics.

What does a software DSM system need to be able to do?

- detect local vs remote memory accesses - create and send messages to the appropriate node - accept messages from other nodes and perform the encoded memory operations - be involved in memory sharing and consistency support

At what level(s) can a software DSM system operate?

- operating system | - programming language runtime

What are "sharing semantics"?

Sharing semantics define when changes to state made by one process are made available to other processes.

P4L3: Distributed Shared Memory Flashcards

(40 cards)