P4L3: Distributed Shared Memory Flashcards
What are four granularities at which we can share state?
- Cache line
- Variable
- Page
- Object
What are the tradeoffs associated with sharing at the granularity of the cache line?
- Sharing at the granularity of the cache line is too fine-grained.
- The amount of coherence traffic outweighs any consistency benefits
What are the tradeoffs associated with sharing at the granularity of the variable?
PROS
This level of granularity makes it possible for programmers to specify sharing semantics on a per-variable basis
CONS
This level of sharing is still too fine-grained, and the level of network overhead will still be too high.
What are the tradeoffs associated with sharing at the granularity of the page?
PROS
- It’s a viable option because it does not generate the coherence traffic of cache line or variable-level sharing
- It also makes sense to the OS! So it’s readily generalizable
CONS
- Like any larger granularity, false sharing is a potential issue (which can occur when two processes are concurrently accessing different portions of the same page. In this case, the coherence traffic that gets generated is unnecessary)
What are the tradeoffs associated with sharing at the granularity of the object?
PROS
- We avoid the coherence traffic of cache line or variable-level sharing
- The OS doesn’t need to be modified
CONS
- The OS does not understands objects, so this requires a specific language runtime.
- This makes object granularity a less generalizable solution than page granularity.
For distributed state management systems (think distributed shared memory) to maintain consistency, what abilities must it have?
- when a node requests data it gets a relatively recent copy of that data.
- we broadcast when state has changed.
Why do we differentiate between a global index structure to find the home nodes and local index structure about the portion of state they are responsible for?
- The global index structure helps nodes to always find the home node for an address/page, which can ensure that a node can immediately get the most recent value for an object
- The local index structures maintained by a home node are necessary to drive coherence mechanisms that are directed only at affected nodes
Do you have some ideas about how you would go about implementing a distributed shared memory system?
The OS should be involved when we try to access shared memory, but not when we try to access local memory.
- We can use the memory management unit (MMU) for this. When we try to access a remote address locally, it will be an invalid memory references. This will generate a fault and trap to the OS.
- The OS will then detect that the memory address is remote and use the global map to look up the home node for the requested address.
- The OS will message that node via IPC and request the data at the address.
- When the data is received, the OS can cache it on the CPU and return it to the process that requested it.
We also need the OS involved when a process tries to write a piece of shared data, but not local data.
- To accomplish this we write-protect virtual addresses that correspond to shared state.
- Writing to these addresses will cause a fault, which traps to the OS.
- The OS will see that the access points to shared data
- If the requesting node is not the home node, the OS will send a message to the home node asking for that state in order to update it
- If the requesting node is the home node, the OS will update the data and broadcast coherence messages to nodes that also store that data.
- It can determine which nodes hold the changing data by maintaining per page data structures that contain a list of nodes that have accessed that page.
- This means that when a node requests a page or an address, it should send its node ID as part of the request.
What’s a consistency model?
A consistency model is an agreement between state (memory for example) and upper software layers.
It guarantees that state changes will be made visible to upper-level applications if those applications follow certain behaviors.
What is a strict consistency model?
A strict consistency model guarantees:
- All updates are made available everywhere, immediately.
- Every node in the system will see all writes in the same order.
This strategy is not possible in practice. SMPs do not even offer this guarantee on single nodes. The added latency and message reordering/loss makes this even harder.
What is a sequential consistency model?
This is next best to the strict consistency model.
- Updates from different processors can be arbitrarily interleaved (ordered) so long as the ordering would be possible on a single processor system.
- All processes see the same interleaving (ordering)!
EXTRAS
- Updates are not required to be immediately visible.
- Updates from the same process must maintain their ordering.
- Concurrent reads will see the same value
What is a causal consistency model?
This is a little less strict than sequential consistency.
- Causal consistency detects causally related writes and ensures that they maintain their order.
- Loosens the sequential requirement that all observed orderings are the same.
EXTRA:
- Updates from the same node cannot be arbitrarily interleaved, just like sequential model
- But NO guarantee about concurrent writes.
What is a weak consistency model?
Instead of inferring causal relationships on its own, a weak consistency model makes a new operation available to the upper software layers: synchronization points.
What are the four consistency models discussed in this course?
- strict
- sequential
- causal
- weak
What is Distributed Shared Memory?
Distributed shared memory is a service that manages memory across multiple nodes so that applications will have the illusion that they are running on a single shared-memory machine.
Why is Distributed Shared Memory important?
Distributed shared memory mechanisms are important because they permit scaling beyond the limitations of how much memory we can include in a single machine.
Single machines with large amounts of memory can cost hundreds of thousands of dollars per machine. In order to scale up memory affordably, it’s imperative to understand DSM concepts and semantics so many cheap machines can be connected to give the illusion of a high memory “single” machine.