Distributed Filesystems Flashcards
What does a filesystem keep track of?
Files: where the data is actually stored
Directories: Groups of files
Metadata: Information on the format and permissions related to a file
What is the primary job of a filesystem?
To make sure data is always accessible and intact.
What is the storage model of the Google Filesystem
- A single file can contain many objects
- Files are divided into fixed chunks with unique IDs
> Disk seek time small compared to transfer time
> A file can be larger than a node’s disk space
> Fixed size makes allocation computations easy - Files are replicated across chunk servers
- The master maintains all filesystem metadata
- Chunkservers store chunks on local disks as linux files
- Neither client nor chunkserver cache file data
How do reads work on the Google Filesystem
- Client sends read request to GFS master
- Master replies with chunk handle and locations
- Client caches metadata
- Client sends a data request to one of the replicas
- The corresponding chunk server returns requested data
How do writes work on the Google Filesystem?
- The client sends a request to the master to allocate the primary replica chunkserver
- The master sends the client the location of the chunkserver replicas and primary replica
- The client sends the write data to all the replicas’ chunk server buffer
- Once the replicas receive the data, the client tells the primary replica to begin the write
- The primary replica writes the data to the appropriate chunks
- The secondary replica completes the write function and reports back to the primary
- The primary sends the confirmation to the client
Does the master keep a persistent record of chunk locations? If not, what does it do?
It queries the chunk servers at startup and is updated by periodic polling.
What happens when a node fails in GFS?
- If it is a master, it has to start up somewhere else
- If it is a chunkserver, it just restarts
How does the master know that a chunk he intends to use is up to date?
The master maintins a chunk version number to distinguish. Before an operation on a chunk, the master ensures version is advanced.
How consistent is GFS?
It has a relaxed consistency model.