Chapter 9: Distributed File Systems Flashcards
1
Q
CAP Theorem
A
- Brewers conjecture [1]
- You can have at most two of these properties for any shared-data system
- Proof the conjecture [2]
- Impossible to reliable provide atomic, consistent data when there are partitions in the network
-
Consistency
- Any read operation that begins after a write operation must contain that value or that of a later write operation
-
Availability
- Every request received must result in a response
-
Partition tolerant
- Even when network failures occure, every request must terminate
- E.g., Datacenter A cannot connect to Datacenter B, the leader election algorithm should not elect in each Datacenter a new leader
2
Q
POSIX guarantees in distributed systems?
A
-
Real world distributed systems
- Higher latencies over network than local access
- Unreliable network
- Limited bandwidth
- Topology changes
- Heterogeneous environments
- CAP theorem
- It is proven that Consistent Available and Partition tolerant systems are not possible
- => Exposing the POSIX API with all the required atomic guarantees (which would require CAP) over network is not possible, hence we call that an abstraction which leaks.
- => Cloud storage API
3
Q
Network File System (NFS)
A
- Sharing of data between computers
- Protocol designed for local LAN‘s
-
NFS creates a remote access layer for file systems
- Remote machines can then handle inodes
- NFS is built on multiple protocols
- NFS (File creation, reading, writing, searching, authentication, stats)
- Mountd (Mounting of exported filesystems)
- –Nsm (Monitors client/server status
- Nlm (Network Lock Manger, provides locking capabilities)
4
Q
Caching in distributed filesystems
A
5
Q
Definitions - Caching
A
-
Cooperative caching
- A request goes through multiple hierarchies. Cooperative caching is used to from from multiple servers a single unified cache.
-
Cache coherence
- A cache is coherent if a process writes to any location, a subsequent read of any process sees the modification
-
Prefetching
- Caching data blocks which will probably be accessed in the near future
6
Q
Lease Manager (LMGR)
A
- File systems use locking mechanism to control access to the disk
-
Leases are locks with an expiration period set up in advance
- Needed in case of network outage
- Causes additional overhead, lease renewal
- Implementation
- Each OSD has one LMGR which acquires and renews the major lease
- All leases for objects are managed by the OSD LMGR
- OSD knows the network address of the current holder of the major- lease
- The LMGR grants exclusive leases for objects residing in the OSD
7
Q
File Manager (FMGR)
A
- Each opened file is managed by a single FMGR – open() creates an instance of a file manager
- FMGR keeps track of each accomplished open() and read() request
- When an open() request arrives at the FMGR it checks whether the file has already been opened by another client, else a proper exclusive lease is acquired from the LMGR
8
Q
Transaction Server (TSVR)
A
-
Responsibility
- Directory operations are implemented as distributed transactions
- Example
- Creating a new file means to create a new entry in the parent directory and creating a file
- Potential failures
- Creating entry in parent directory
- Creating file
- Initiating host can fail
- Works on a per operation basic
- Acquires leases and performs the action
- Holds acquired leases for as long as possible
9
Q
GFS (The Google File System)
A
- Fault tolerance while running on inexpensive commodity hardware
- Introduces an API which is designed to be implemented scalably
-
Mutation
- The GFS paper uses the word „mutation“ for any modification done to a file. This can be an in-place update, an append or a file creation, hence we also use it in the slides.
-
Leases
- Ownership for a specified length of time
- Leases can be renewed or extended by the owner
- When the owner fails, the lease expires
- The owner can end the lease early
10
Q
GFS Design assumptions
A
- Inexpensive commodity components that often fail
- Typical file size 100MB or larger, no need to optimize for small files
-
Read workload
- Large streaming reads
- Small random reads
-
Write workload
- Append to files
- Modification supported but a design goal
- Hundreds of concurrently appending clients
- Bandwidth is more important than low latency
11
Q
Architecture GFS
A
-
Files
- Divided into fixed-size chunks
- Identified by an immutable and unique id (chunk handle)
-
Single master
-
Maintains file system metadata
- Namespace, access control information, mapping from files to chunks, location of chunks
- Garbage collection (deferred deletion of files)
- Sends heartbeat messages to chunk server
-
Maintains file system metadata
-
Multiple chunk servers
- Chunks are stored on disks as files
- Each chunk is replicated to multiple chunk servers (depending on the replication factor of the region)
12
Q
Guarantees provided by GFS
A
-
File namespace mutations are atomic
- e.g., file creation, …
- Uniquely handled by master
- Masters operation log defines a global total order of the operations
- File manipulations
- The state of a file region after a data mutation depends on the type of the action
-
Definitions
- Consistent
- all clients see the same data, regardless of which replica they read from
- Defined – consistent and also the clients see what the file mutation writes in its entirety
- Inconsistent – multiple clients see different kinds of data
13
Q
How is fault-tolerance achieved? GFS
A
-
Master
- Operation Log, Replication to shadow master
-
Chunk server
- All chunks are versioned
- Version number updated when a lease is granted
- Chunks with old versions are not served and are deleted
-
Chunks
- Re-replication triggered by master maintains replication factor – Rebalancing
- Data integrity checks
14
Q
How is high-availability achieved? GFS
A
-
Fast recovery of master
- Check pointing and operation log
-
Heartbeat messages
- Include piggybacked information
- Can trigger re-replication
- Share current load
- Can trigger garbage collection
- => Chunkservers can fail any time
- Diagnostic tools
15
Q
Summary on GFS
A
- Highly concurrent reads and appends
- Highly scalable
- On cheap commodity hardware
- Built for map-reduce kind of workloads
- Reads
- Appends
- Developers have to understand the limitations
- and may have to use other tools to work around