Module 6 - Distributed File Systems Flashcards
Distributed file systems allow ________ to access _______ systems on ________ servers
applications
file
remote
There are two ways for a client to access a file in a DFS (distributed file system). What are their names?
- Remote access model
2. upload/download model
What is the remote access model for a DFS (distributed file system)?
- The file always lives on the server
- Anytime a client wants to read or write, it needs to issue an RPC (or a request)
What is the upload/download model for a DFS (distributed file system)?
- The file is managed by the server, but transfers a copy of the file to the client
- Once the file is received, the client locally performs reads and writes on it
- When the client is done accessing the file, the new version of the file is then transferred back to the server
What does NFS stand for? when was it created? and by which organization? Does ecelinux use sun?
Network File System
Created by Sun Microsystems
in 1984
yes, ecelinux uses NFSv4
In NFS, does the server have an RPC stub? or the client? or both?
both the client and the server have their own RPC stubs
What is the step-by-step process of the client trying to access a chunk of a file on the remote server?
- Client makes a system call to the kernel, and specifies the path of the file which it is trying to access
- Turns into a request which passes through the VFS (virtual file system) client layer, and the NFS client
- The NFS’s client uses its RPC client stub to make a call to the RPC server stub, which triggers a execution in the NFS server program
- The NFS server program makes a call to the VFS server layer which fetches the file from the server’s file system
- The fetched file is returned to the client through the RPC stubs and then propagated back to the client’s VFS
NFS supports client-side caching. What is the motivation behind this?
What are the caches used for in NFS?
- Caching reduces communication between the client and the server
- The cache is used to hold UPDATES to a file
Whenever caches are used in NFS, the cache holds modifications that have been made to a specific file.
When are file modifications propagated to server after sitting in the cache?
What issue arrises if this took place in a distributed NFS with replication?
- File modifications are flushed back to the server whenever the client closes the file
- In a distributed NFS, this could lead to inconsistencies in files across replicas
Describe the “delegation of authority” mechanism used in NFS for upload/download of files
What is the purpose of this delegation? What does it mean in the context of two clients trying to access the file?
- Client asks server for the file
- Server delegates authority of the file to the client
- Server recalls delegation
- Client sends returns file
Delegation in step 2 ensures that only one client can modify a specific file at a time (since it has the authority from the server). Other clients cannot access it until the authority recalls delegation
NFS uses RPCs internally. An optimization to the NFS product was adding compound procedures.
What does a file read between a client and a server in NFS look like with and without compound procedures?
How is the latency decreased with compound procedures?
In NFS, whenever the client makes a request to the server to read a file, it has to first perform a LOOKUP, and then performs a READ on the file
This mechanism without compound procedures:
- client makes a LOOKUP network request, gets a response
- client makes a READ network request, and then gets another response
With compound procedures:
1. client makes a LOOKUP and a READ call in the same network request
Therefore, with compound procedures there is only one network request, but without it, there are two. Thus, the latency is reduced
Usually, an NFS server generally exports only a part of its local file system to the remote client.
What does a client typically do to its local file structure to integrate this part from the server?
The client imports this segment, and adds this portion of the server’s file system to its local file system
The remote file segment is mounted onto the client under a certain path
Suppose a client imports a directory which contains a subdirectory which was imported from another remote host.
How does this client access the nested directory in terms of imports?
If a client imports a directory from server A which contains another imported directory from server B, then the client will import the nested directory DIRECTLY from server B
A large scale DFS may distribute files across multiple servers in order to manage very large files.
What are the two ways of doing this?
- Making all chucks of each file reside at their own server (chunks of a file are not partitioned across servers)
- Split the chunks of a file across numerous servers (just like sharding in databases)
In a large scale DFS which distributes files across numerous servers by storing files in chunks (just like sharding in databases), how can this result in improved throughput?
In the case where the server is the bottleneck of the system, the partitioned files allow load to be balanced across numerous servers - thereby improving Tput