Module 6 - Distributed File Systems Flashcards

1
Q

Distributed file systems allow ________ to access _______ systems on ________ servers

A

applications
file
remote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

There are two ways for a client to access a file in a DFS (distributed file system). What are their names?

A
  1. Remote access model

2. upload/download model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the remote access model for a DFS (distributed file system)?

A
  • The file always lives on the server

- Anytime a client wants to read or write, it needs to issue an RPC (or a request)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the upload/download model for a DFS (distributed file system)?

A
  • The file is managed by the server, but transfers a copy of the file to the client
  • Once the file is received, the client locally performs reads and writes on it
  • When the client is done accessing the file, the new version of the file is then transferred back to the server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does NFS stand for? when was it created? and by which organization? Does ecelinux use sun?

A

Network File System

Created by Sun Microsystems
in 1984

yes, ecelinux uses NFSv4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In NFS, does the server have an RPC stub? or the client? or both?

A

both the client and the server have their own RPC stubs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the step-by-step process of the client trying to access a chunk of a file on the remote server?

A
  1. Client makes a system call to the kernel, and specifies the path of the file which it is trying to access
  2. Turns into a request which passes through the VFS (virtual file system) client layer, and the NFS client
  3. The NFS’s client uses its RPC client stub to make a call to the RPC server stub, which triggers a execution in the NFS server program
  4. The NFS server program makes a call to the VFS server layer which fetches the file from the server’s file system
  5. The fetched file is returned to the client through the RPC stubs and then propagated back to the client’s VFS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

NFS supports client-side caching. What is the motivation behind this?

What are the caches used for in NFS?

A
  • Caching reduces communication between the client and the server
  • The cache is used to hold UPDATES to a file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whenever caches are used in NFS, the cache holds modifications that have been made to a specific file.

When are file modifications propagated to server after sitting in the cache?

What issue arrises if this took place in a distributed NFS with replication?

A
  • File modifications are flushed back to the server whenever the client closes the file
  • In a distributed NFS, this could lead to inconsistencies in files across replicas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the “delegation of authority” mechanism used in NFS for upload/download of files

What is the purpose of this delegation? What does it mean in the context of two clients trying to access the file?

A
  1. Client asks server for the file
  2. Server delegates authority of the file to the client
  3. Server recalls delegation
  4. Client sends returns file

Delegation in step 2 ensures that only one client can modify a specific file at a time (since it has the authority from the server). Other clients cannot access it until the authority recalls delegation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NFS uses RPCs internally. An optimization to the NFS product was adding compound procedures.

What does a file read between a client and a server in NFS look like with and without compound procedures?

How is the latency decreased with compound procedures?

A

In NFS, whenever the client makes a request to the server to read a file, it has to first perform a LOOKUP, and then performs a READ on the file

This mechanism without compound procedures:

  1. client makes a LOOKUP network request, gets a response
  2. client makes a READ network request, and then gets another response

With compound procedures:
1. client makes a LOOKUP and a READ call in the same network request

Therefore, with compound procedures there is only one network request, but without it, there are two. Thus, the latency is reduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Usually, an NFS server generally exports only a part of its local file system to the remote client.

What does a client typically do to its local file structure to integrate this part from the server?

A

The client imports this segment, and adds this portion of the server’s file system to its local file system

The remote file segment is mounted onto the client under a certain path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Suppose a client imports a directory which contains a subdirectory which was imported from another remote host.

How does this client access the nested directory in terms of imports?

A

If a client imports a directory from server A which contains another imported directory from server B, then the client will import the nested directory DIRECTLY from server B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A large scale DFS may distribute files across multiple servers in order to manage very large files.

What are the two ways of doing this?

A
  1. Making all chucks of each file reside at their own server (chunks of a file are not partitioned across servers)
  2. Split the chunks of a file across numerous servers (just like sharding in databases)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a large scale DFS which distributes files across numerous servers by storing files in chunks (just like sharding in databases), how can this result in improved throughput?

A

In the case where the server is the bottleneck of the system, the partitioned files allow load to be balanced across numerous servers - thereby improving Tput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the Google File System (GFS), describe the following:

  1. The master node
  2. The chunk servers
  3. The underlying file system existing in each chunk server
A
  1. Master node stores meta-data about the files (size, path, access rights) and chunks - servers it to the client
  2. The chunk servers store a chunk (which could be a replica) of the overall file system with no metadata
  3. The underlying file system is a linux file system in each chunk
17
Q

Why does GFS (Google file system) distribute the files across numerous chunk servers?

What other famous file system is an open-source implementation of GFS?

A

Distribution of the files across numerous chunks provides fault tolerance in software

HDFS (Hadoop distributed file system) is an open-source implementation of GFS

18
Q

What is a Google file system (GFS) made up of?

A
  • Master node
  • GFS client
  • Collection of chunk servers
19
Q

In GFS (Google file system), the master node’s metadata about chunks are ______ in main memory and ______ are logged to local storage

A

cached

updates

20
Q

How does the master node in GFS (Google file system) keep the meta-data consistent with the state of the chunk servers?

A

The master periodically polls the chunk servers to keep the meta-data consistent

21
Q

In GFS (google file system), what are the steps for the client read data from a file?

A
  1. Client sends the file name and chunk index to the master
  2. The master responds with a contact address of how to access this file
  3. The client then pulls data directly from a chunk server, bypassing the master
22
Q

What is the step-by-step mechanism in which GFS (google file system) updates data in a given file?

A
  1. A client contacts the nearest chunk server holding the data, and pushes its updates to that server
  2. This server will push the update to the next closest server which is holding the data (secondary), and so on, in a pipelined fashion until all replicas receive the data
  3. The primary chunk server assigns a sequence number to the update operation and passes it on to the secondary chunk servers (bypassing master)
  4. Primary replica informs client that the update is complete
23
Q

In a centralized sharing setting, what are the semantics of file sharing? (two points)

Under what condition can these same semantics be achieved in a DFS?

A

Centralized file sharing semantics:

  • Operations are strictly ordered in time
  • Application can ALWAYS read its own writes

This can be a DFS as long as there is only one file server and the files are not cached

24
Q

When a cached file is modified in a DFS, it is ________ but ________ to propagate the changes _______ to the file server. Instead, they are made after the file is closed

A

possible
impractical
immediately

25
Q

In file sharing semantics, let’s use the session semantics method.

  1. When a client makes a modification to a file in a DFS (without closing it), what is the visibility of this modification?
  2. When do the changes get propagated to the other clients viewing the files?
  3. Which party determines the final version of the file?
A
  1. The modifications are only visible to the process that modified that file
  2. The modifications are only made visible to other clients when the file is closed
  3. The final version of the file is determined by the last client that closes that file
26
Q

The semantics of file sharing in a DFS can be defined in numerous ways

  1. What does NFS use?
  2. What about HDFS?
A
  1. NFS uses session semantics

2. HDFS uses immutable files but supports an append function so that logs can be made

27
Q

What does UNIX file sharing semantics describe?

A

Every operation on a file is instantly visible to all processes

28
Q

What does session semantics describe?

A

No changes are visible to other processes until the file is closed

29
Q

What does immutable file sharing semantics describe?

A

No updates are possible. Makes it very simple for sharing and replication

30
Q

What does Transactions file sharing semantics describe?

A

All changes occur atomically