Distributed Filesystems Flashcards

1
Q

What does a filesystem keep track of?

A

Files: where the data is actually stored
Directories: Groups of files
Metadata: Information on the format and permissions related to a file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primary job of a filesystem?

A

To make sure data is always accessible and intact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the storage model of the Google Filesystem

A
  • A single file can contain many objects
  • Files are divided into fixed chunks with unique IDs
    > Disk seek time small compared to transfer time
    > A file can be larger than a node’s disk space
    > Fixed size makes allocation computations easy
  • Files are replicated across chunk servers
  • The master maintains all filesystem metadata
  • Chunkservers store chunks on local disks as linux files
  • Neither client nor chunkserver cache file data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do reads work on the Google Filesystem

A
  • Client sends read request to GFS master
  • Master replies with chunk handle and locations
  • Client caches metadata
  • Client sends a data request to one of the replicas
  • The corresponding chunk server returns requested data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do writes work on the Google Filesystem?

A
  • The client sends a request to the master to allocate the primary replica chunkserver
  • The master sends the client the location of the chunkserver replicas and primary replica
  • The client sends the write data to all the replicas’ chunk server buffer
  • Once the replicas receive the data, the client tells the primary replica to begin the write
  • The primary replica writes the data to the appropriate chunks
  • The secondary replica completes the write function and reports back to the primary
  • The primary sends the confirmation to the client
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Does the master keep a persistent record of chunk locations? If not, what does it do?

A

It queries the chunk servers at startup and is updated by periodic polling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens when a node fails in GFS?

A
  • If it is a master, it has to start up somewhere else
  • If it is a chunkserver, it just restarts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the master know that a chunk he intends to use is up to date?

A

The master maintins a chunk version number to distinguish. Before an operation on a chunk, the master ensures version is advanced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How consistent is GFS?

A

It has a relaxed consistency model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly