Final Review Flashcards
Describe the data structures needed at the servers’ and at the clients’ side to support the operation of the Sprite system.
Client datastructure
- Version number (compare with server version on
open
=> know if file has been updated / written to) - Cached blocks (client side cache)
- Timer for each dirty block (Need to be pushed every 30 seconds if dirty and not modified)
- Can get the dirty blocks (sequential sharing: server fetch dirty bytes from client when another client opens)
- Cache enabled [yes/no] (for concurrent sharing/writes, including one writer an simultaneous reader)
Server data structure
- Version number (incremented each time opened for writing, )
- Writers (where to fetch dirty bytes from on sequential sharing - writer might have
closed()
file, but not written back yet) - Readers
- track concurrent access - e.g one writer and concurrent reader - still need to disable cache, not only for two writers)
- Server only notifies current readers to disable the file cache (not all clients) when a new writer to file occurs => subsequent readers need to
open
and therefore contact server first anyways and then get told to diable cache
- Cache enabled [yes/no]
Name the main design points of the SPRITE distributed filesystem.
-
reduce writes by only write back blocks every 30 seconds that have NOT been modified for 30 seconds (do not use write-through (every write against server) or session-based (every close() writes to server - reduce 30% write requests)
Observation:- If a block is being worked on, it is likely that it will continue to be worked on in the near future.
- The 30 second threshold is related to the point that 20%-30% of data is deleted within 30 seconds.
- Mechanism to allow sequential write-sharing: Server needs to fetch from client the dirty bytes (not populated back yet) in case another client
opens
file => client needs to keep track of dirty bytes
Mechanism to allow concurrent sharing / write: Disable cache. NOT performant (no focus on concurrent writes!)
Gotcha from the paper: Sprite guarantees most recent data returned on read
regardless of when and where it was last written (for both sequential and concurrent sharing/writing).
For distributed state management systems (think distributed shared memory) what are the basic mechanisms needed to maintain consistence – e.g., do you know why is it useful to use ‘home nodes’
There is no central server like for some Distributed Filesystems that controls access to the shared state and enforces consistency (like in project 4).
In theory, to maintain perfect consistency, each remote node could just always access (read/write) the shared data (e.g on page granularity) on the remote node. This has poor performance. Hence the use of caches!
The use of caches
requires consistency mechanisms!
There are two ways of providing caches on multiple nodes
- migration of shared data (only works for SRSW)
- replication of data to nodes (works also for MRMW, only useful option)
Basic coherence mechanisms to maintain consistency accross these replicated caches
- Push vs Pull (regularely pull) based.
- Push is similar to
write-update
mechanism for SMP systems: problem too much overhead.
Idea: Each node is responsible for its shared state / page frames that originate on its physical memory. This is the home node
. Each page frame is identified by a touple {home node ID, page frame number}
. The home node
stays the same.
Gotcha: Instead of one central server (Project 4, SPRITE Filesystem), each node does the consistency management for the shared state/ frames that originate on its physical memory.
Gotcha: each node is a home node
as well as a client that needs to contact the home nodes
of shared data that it caches. Needs to contact other home nodes
because they track this shared state (readers, writers, ..) and provide the consistency operations for these particulare pages. They are all “peers” (vs. Shared Filesystem with one or many servers and dedicated writers.)
Distributed Shared memory (pages): What are the different guarantees that change in the different models we mentioned – strict, sequential, causal, weak.
Prerequisite to be able to judge whether an update sequence is legal!
Strict consistency = updates visible everywhere immediately. In the correct order!
- impossible to guarantee
Sequential consistency = No strong gurantee of order of writes (memory updates from diff. processor might be arbitrarely interleaved). BUT: all readers read the SAME state (cannot be that state update propagates to part of the nodes - so R1 & R2 read different data at same time T).
- Restrictions: Updates from SAME processor always in order they were issued.
Causal consistency: Causally related writes are ordered! Non-causally related writes have no ordering guarantees.
- Same restriction: Updates from SAME processor always in order they were issued.
Weak consistency: No ordering guarantees whatsoever. Need to use manual sync points.
Synchronisation point: Explicit instructions issue by a process to synchronise writes
- Propagate local changes so that there are visible to all other processes (ONLY IF THE OTHER PROCESS ALSO CALLED SYNC)
- Synchronise updates from all other processes
- For a write from P1 to be visible in P2, both P1 and P2 must have called
sync
NOTE: Unlike, strict, sequential, causal, does NOT force strict ordering for writes from same processor!