P4L2. Distributed File Systems Flashcards
What is the upload/download model?
Examples?
Pros/Cons?
Client downloads a file from the server, performs updates on it locally, then uploads the file.
Example: FTP, SVN
Pros
+ local reads/writes at client
Cons
- entire file download/updload even for small accesses
- server gives up control
What is the true remote file access model?
Every access to remote file goes to the server, nothing done locally
Pros
+ file access centrilized, easy to reason about consistency, multiple clients can’t overwrite a file at the same time
Cons
- every file operation pays a network latency cost (even reading only since client cannot cache file)
- limits server scalability b/c everything has to go through server
What is a stateless file server?
Pros/Cons?
Stateless means the sever doesn’t keep any information (e.g., which clients access which files, how many clients there are, etc). Every request has to be self-contained (include everything it needs to do its work like file name, offset, data).
Pros
+ no resources used on the server side to maitain state (CPU/MM)
+ resilient: on failure, just restart
Cons
- cannot support caching and consistency management (we need state to do this)
- every request self-contained => more bits transferred to describe request
What is a stateful file server?
Pros/Cons?
A server that keeps state needed to track what is cached/accessed (e.g., who had portions of the file cache, who has written to a file, etc)
Pros
+ can support locking, caching, incremental operations
Cons
- need checkpointing and recovery mechanisms to handle failure
- overheads to maintain state and consistncy => depends on caching mechanism and consistency protocol
What is caching state in a DFS?
Clients can locally maintain a portion of state (e.g., file blocks).
Clients can locally perform operations on cached state (e.g., open/read/write)
A coherence mechanism is require to keep the cached portions of files consistent with the server representation.
What is “UNIX semantics”?
Every write is visible immediately
What is “session semantics”?
- write-back on close(), update on open()
- easy to reason about, but may be insufficient
What is “periodic updates”?
- client writes-back periodically => clients have a “lease” on how long they can used the cached data (not exclusive necessarily)
- server invalidates periodically => provides bounds on inconsistency. easier to correct conflicts b/c they are fewer & smaller
- augment with flush()/sync() API
What is “immutable files”?
- never modify, new files created
What is replication?
Pros/Cons?
Each machine holds all files
Pros
+ load balancing
+ availability
+ fault tolerance
Cons
- writes become more complex (sychnronously write to all, or write to one then propagate to others)
- replicas must be reconsiled (e.g., voting)
- scalability (machines only get so larage)
What is partinioning?
Pros/Cons?
Each machine has a subset of files
Pros
+ availabiity vs single server design
+ scalability w/ file system size
+ single file writes are simple
Cons
- on failure, lose portion of data
- load balancing harder; if not balanced, then hot spots possible