File System Flashcards

1
Q

What is the File System for?

A

File System provides:

  • Large and cheap storage space
  • Non-volatility
  • Sharing information between processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a File System?

A

It is a collection of files + directory structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are files?

A

A container for storing information.

File abstracts away the different kinds of information into a single struct - devices, pipes, text files, data files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where are files stored?

A

Generally, they are stored on disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the different kinds of representations a file has?

A

Three kinds of files:

  • byte sequence
  • record sequence
  • tree of records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What kind of types a file has?

A

User files:

  • ASCII
  • Binary

System files

  • Directories
  • Special files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How many kinds of access-to-file we have?

A

Sequential access:

  • Read all bytes/records from the beginning
  • cannot jump around

Random access

  • bytes/records read in any order
  • All files of modren OSs are random access
  • read/write functions can
    • recieve an offset parameter
    • seek(offset, start_from)
      • start_from = {0,1,2}
        • 0 - beginning
        • 1 - current
        • 2 - end
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the file attributes?

A

General info - user ID, Group ID, dates, times

Location & size - pointer to a device and location on it

Flags that store information for the system

Another special flags - Protection, password..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the file’s operations?

A

Create ; Delete

Open ; Close

Write ; Read ; Seek

Get attributes

Set attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the directory’s operations?

A

Create entry ; Delete entry (?)

Search for a file

Create/Delete a directory file

List a directory

Rename a file

Link a file to a directory

Traverse a file system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the dot and dotdot directory entries?

What’s the relation between a process and a path?

A

. - the current directory.

.. - the parent directory

Each process has its own working directory, which is shared by its threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using which tool files(inodes) are shared?

A

Links

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are soft links and hard links?

A

Soft

  • Containing a path name. access is slower
  • If source is removed the link becomes broken

Hard

  • information about a shared file is duplicated in sharing directories.
  • fast, points to file
  • Link count must be maintained.
  • if source is removed link still accessible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between shared and exclusive locks?

A

Exclusive - protects updates to file resources and can be owned by only one transaction at a time.

Acquiring an exclusive lock demands waiting if another process is currently holding an exclusive lock or a shared lock of the same resource.

shared - can be owned by several processes at a time.

Acquiring a shared lock demands waiting if another process is currently holding an exclusive lock.

A new request for a shared lock must wait if there’s a request for an exclusive lock on a resource that already has a shared lock.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can i block or unblock a file?

A

Using the command flock(file descriptor, operation)

  • operation:
    • LOCK_SH - place a shared lock.
    • LOCK_EX - place an exclusive lock.
    • LOCK_UN - remove an existing lock held by this process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to the lock when a file is closed or a process terminates?

A

File lock is removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can we lock parts of the file?

A

Yes.

Any part of the file may be blocked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the concerns of the user/kernel regarding files?

A

User:

  • File names
  • operations allowed
  • location(directory structures)

System:

  • Storage of files and directories
  • Disk space management
  • implementation efficiency and reliability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Disk allocation:

Advantages/Disadvantages of:

  • Contiguous allocation?
  • Linked list of disk blocks
  • Linked list using in-memory Files Allocation Table
A

Contiguous:

  • Simple; access is fast
  • external fragmentation
    • How much size to allocate at creation time?

Linked list of disk blocks:

  • No fragmentation and easy allocation
  • slow random access
    • n disk accesses to get to the n’th block
      • because the pointer to the next block is held within the block.
  • weird block size

Linked list using in-memory Files Allocation Table

  • none of the above disadvantages.
  • Uses a table to store the pointers of all blocks in the linked list that represent files - last file block has a special EOF symbol.
    • we don’t need allocations to be continguous - segmentation free
    • Can access the nth block of some file using the table and not using the table blocks.
  • A very large table in memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an inode and who holds it?

A

It is a place, on disk, where data about the file is stored.

Each file struct holds inode. (might be two files pointing to the same inode..)

There is a struct called dinode which holds the:

  • File Attributes
    • Time last accessed
    • Time last modified
    • Size
    • File type, protection bits
  • Nlinks
    • Number of directory entries pointing to this i-node.
  • Address of disk block 0
  • Address of disk block 1
  • ..
  • Address of block of pointer
    • Containing additional disk addresses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the classic Unix disk structure?

A
  • Boot Sector
  • Super Block
    • Number of inodes
    • Number of Blocks
    • Number of free blocks
    • Pointer to free blocks list
    • Pointer to free i-node list
  • i-nodes
  • Data blocks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Unix inodes - how bytes counting work?

A

I didn’t get it. You try.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Tell me in which block byte 355,000 can be found?

A
  • First 10 blocks - 10K:*
  • 0 - 10240
  • Indircet block - 10K + 256K = 266K =*
  • 10240 - 272384
  • double indirect block -*
  • refers to a list of 256K indirect blocks - each of size 256KB.
  • 355,000 = 272384 + X*
  • X = 82616. That is, it is placed within the 0th indirect block, in the 80th block. (starting at byte 81920), at byte number 696*
24
Q

What is the file descriptor table for? who holds it?

A
  • It is for quick access of a proccess to its files.
  • Each process has a file descriptors table
  • One entry per each open file.
25
Q
  • Where should we keep the file offset?*
  • what if two processes want to read from the same file but starting from a different offset?*
  • What if i want a couple of threads to write to the same file one after the other?*
A
  • we keep the offset within the file struct .*
  • Two files might refer to the same inode but have different offset, thus no problem for them to read different parts of the data.*
  • For the mission of writing/reading together Unix provided the following solution:*
  • Parent’s file descriptors-table-entries and child’s file descriptors-table-entries point to the same file struct, thus holding the same offset and flags.*
  • Unrelated processes point to different files thus have different offsets.*
26
Q

Can a user access the i-node of a file?

A

No.

Each process has its own file descriptors table that points to the entries in the kernel’s open files description table.

The kernel’s open files description table entries point to the i-node of the file.

27
Q

What does the open call do?

A

It adds an entry to both the kernel’s open file descriptions and the process’ file description table.

28
Q

How is a directory implemented?

A

A simple directory(like in MS-DOS):

  • fixed size entries
  • disk addresses & attributes in diretory entry

Directory entries simply point to i-nodes.

29
Q

There are two ways of handling long file names:

  • In-line
  • In a heap

what does in-line means?

A

In-line means it is included in the file. That is, there is no pointer referring to some outsource which holds the content, it is “physically” here.

30
Q

How Unix directories are implemented?

A

Implemented using struct dirent which holds :

  • name
  • inode array
31
Q

What are the Pros/Cons with having large blocks?

A

Pros:

  • Typically a multiple number of sectors
    • Because its size requires division
  • In sequential access, less blocks to read/write - less seek/search.

Cons:

  • High internal fragmentatios
  • In Random access larger transfer time, larger memory buffers.
32
Q

What are the Pros/Cons with having small blocks?

A

Pros:

  • Smaller internal fragmentatios
  • Faster random access

Cons:

  • Slower sequential access(more seeks)
33
Q

What are the disk access time parameters?

A
  • A - Average seek-time
    • Avg time for head to get above a cylinder.
  • B - Average time to get to track block.
  • C - Rotation time
    • Time for disk to complete full rotation
  • D - Transfer time
    • (block size/track size)*rotation time

Average time to access block is given by:

A + B + D.

34
Q

How can we keep track of free blocks?

A

As a linked list of blocks

  • Adresses of the first n free blocks are stored at the super block.
  • First n-1 blocks are free to be assigned.
  • Last block contains address of a block which contains n more addresses to free blocks..
  • Addresses of many free blocks are retrieved with one disk access.*
  • Unix maintains a single block of free-block addresses in memory.*
  • Whenever the last free block is reached, the next block of free-blocks is read and used.*

As a bit map

35
Q

Didn’t get it.. you try

A
36
Q

File system consistency

  • we count the number of references in free lists
  • we count the number of references in inodes

what results are troubling?

How is consistency mainted?

A
  • Both counts 0 - block is missing
    • Add to free list
  • More than once in free list
    • delete all references but one
  • More than once in files - Trouble
    • two or more different files write/read from the same place. which to delete? problem.
  • In both file and free list
    • Delete from free list.

Hard links may solve the problem

  • count - number of references to each i-node which is found by descending down the file system tree.
  • Compare count with the link-count field in the i-node struct.
  • If different, correct link-count field.
37
Q

What are the hazards if:

  • Link-count > actual links number
  • Link-count < actual links number
A

Link-count > actual links number

  • This file will not be removed.
  • Even after removing all actual links - there’s a fake one which tell the kernel to avoid removal.
    • A waste of memory.

Link-count < actual links number

  • The file will be removed even though there’s a file pointing to it. It is a much greater problem
38
Q

How does Unix enhance performance?

A

Using filename caching

39
Q

What is the Buffer Cache?

Who maintains that?

A

It is a pool of internal data buffers - the buffer cache.

The kernel maintains that.

Important notes

  • Data written to disk is cached, for later use.
    • Algorithms instruct the buffer to delay-write.
  • If data is not found in cache, it is read from disk to cache.
    • Algoritms instruct the buffer to pre-cache.
40
Q
  • What are the buffer’s states?*
  • How buffers are implemented?*
A

Buffers are categorized into:

  • busy- currently being used.
  • Clean - available for use, sync with block content on disk.
  • Free - empty and haven’t been used yet.
  • Dirty - needs to be moved to write list.
  • Each has a header, that includes the pair .*
  • The Buffer Cache is implemented using LRU list:*
  • Buffers are on a doubly-linked list in LRU order.
  • Each hash-queue entry points to a linked list of buffers that have the same hash value.
  • A block may be in only one hash-queue.
  • A free block is on the free-list(maintained by the kernel, remember?) in addition to being on a hash-queue.
  • When looking for a particular block, the hash-queue for it is searched. When in need of a new block , it is removed from the free list.
41
Q

What happens if the block is:

  1. Found in its hash queue and is free
  2. Found in its hash queue and is busy
  3. Not found in the hash queue and there are free buffers
  4. Not found in the hash queue and in searching the free list for a free buffer, one or more “delayed-write” buffers are found.
  5. Not found in the hash queue and free list is empty.
A
  1. Found in its hash queue and is free​
    • ​buffer is marked busy
    • buffer is removed from free list
  2. Found in its has queue and is busy
    • ​process sleeps until buffer is freed, then recheak.
  3. Not found in the hash queue and there are free buffers
    • A free buffer is allocated from the pool
  4. Not found in the hash queue and in searching the free list for a free buffer, one or more “delayed-write” buffers are found.
    • write delayed-write buffer(s) to disk
    • move them to the most recently used side of the list and find a free buffer
  5. Not found in the hash queue and free list is empty.
    • Block requesting process, when scheduled, go through hash-queue again.
42
Q

Process B looks up for block b to find that its buffer is locked.

He waits until the buffer of block B is available. When is returned, it needs to search again in the hash table for block b.

Why?

A

Because while waiting some other process C might have gotten the buffer, might have loaded it with another block c.

43
Q

Why not pure LRU?

A
  • Some blocks should be written as quickly as possible.
    • Insert critical blocks at the head of the queue ,to be replaced soon and written to disk.
    • Have a system daemon that calls sync every 30 seconds, to help in updating blocks.
  • Some blocks are likely to be used again. (partly filled blocks being written)
    • Insert to the end to stay longer in the cache.
44
Q

What is special about the NTFS?

A

NTFS:

  • A journaling file system - a file system that keeps track of changed not yet commited to the file system’s main part by recording the intentions of such changes.
  • Each file is represented by a record in a special file called the master file table(MFT)
45
Q

MFT - Master File Table

  • what does it hold?
  • does its records are of the same size for process?
  • Can it save file’s data?
  • Is there any limit of file size?
  • Where does the MFT address lie?
A

MFT - Master File Table

  • Each file/directory has one or more 1K records in the MFT.
    • A record contains file attributes and a list of block numbers.
  • Larger files need more then one MFT record for the list of blocks - records are extended by pointing to other records.
    • Disk blocks are described by sequence of records, each of which is a series of runs.
      • ​A run is a continguous sequence of blocks and represented by a pair: (offset, length)
  • Case the file is very small - data can be kept directly.
  • The boot sector contains the MFT address
  • NTFS tries to allocate blocks continguosuly
    • That is why run is a contiguous sequence of blocks.
  • No upper bound on file size
46
Q

Is it possible that a short file uses more MFT records than a longer file?

A

Yes.

Assume files A,B where B is a longer.

A - most of its blocks are not sequential.

  • A few runs structs.
  • Might have a few MFT records.

B - all of his blocks are sequential.

  • Only one run struct is needed.
  • Only one MFT record is needed.
47
Q

How is NTFS(new technology file system) implemented with directories?

A

With small directories:

  • Standard info
  • A directory entry contains the MFT index for the file, length of the file’s name, file’s name, and various fields and flags.

With large directories:

  • Organized as B+ trees
  • Supports transparent file compression
  • Compresses in groups of 16 blocks.
    • Can select to compress whole volume, specific directories or files.
48
Q

Is compression good or bad for performance?

Disk throughput - increased/decreased?

  • CPU works harder/lighter?
  • As function of compression unit size - RAM access slowed down or increased?
A

Disk throughput is increased

  • because size is smaller.

CPU works harder

  • Because it needs to compress/decompress

Access to RAM slowed down

49
Q

DFS - Distributed File Systems

  • What is it?
  • what is mount?
  • remote file access can be “stateful” or “stateless”. explain.
  • What is it used for?
A

DFS - Distributed File Systems

  • A collection of interconnected machines that do not share memory or a clock.
  • We can use file naming scheme. ({hostname:path})
    • This method is not transparent.
    • we want the kernel to hide details
    • This can be solved using mount
  • mounting is a process by which the OS makes files and directories on a storage device available for users to access via the computer’s file system.
  • A server exports part of its file system
    • We don’t want clients to be exposed to the server’s data..
    • stateless protocol - server remembers nothing about previous requests from a client
    • stateful protocol - more efficient(information kept in server’s kernel)
    • can be different machines and different OSs
    • Any machine can be both client and server
    • Clients access directories by mounting
    • remote mounts are invisible
      • ​guess it means A request B that request C..
    • ​Servers specify their exported directories upon boot, listed in /etc/exports
    • File sharing: accessing a file in a directory mounted by other (sharing) clients.
50
Q

NFS(v3) protocols

  • How mounting requests work?
  • What file handles contain?
  • What is the file operations protocol?
A

NFS(v3) protocols

  • send: {host-name;target-directory path}, get: file handle
  • File handle contains:
    • ​File system type
    • Disk ID
    • i-node number
    • Protection information
  • ​File operations - directory and file access:
    • ​lookup provides a file handle
    • reads and writes have all the needed information by using file handles - offsets are absolute.
    • No open or close calls.
    • Server does not keep open files tables.
    • Files cannot be locked
51
Q

Remote Procedure Calls (RPC)

  • How does it activate code on a remote machine?
    • using send/receive protocol
  • Proccess on one machine calls a procedure on another machine. What must be maintained?
      • synchronous(!) - blocking send
  • Problems?
  • What is the general scheme?
A

Remote Procedure Calls (RPC)

  • How does it activate code on a remote machine?
    • using send/receive protocol
  • Proccess on one machine calls a procedure on another machine. What must be maintained?
    • synchronous(!) - blocking send
  • Problems?
    • Different address spaces
      • Parameters and results have to be passed - by they refer to addresses the client/server don’t have
    • Machines can crash…
  • general scheme
    • A client stub and a server stub
      • A stub is a piece of code that converts parameters passed between client and sever
    • Server stub blocked on recieve.
    • Client stub blocks for returning message
52
Q

What is NFS?

A

NFS is a distributed file system protocol allowing a user on a client computer to access files over a computer network much like local storage is accessed.

53
Q

How NFS is implemented?

A

NFS implemenation:

  • Client & Server have:
    • Virtual File System layer.
      • Keeps v-nodes for open-files.
        • v-node = {i-node | r-node}
    • NFS module
      • Used for remote requests
      • Client creates r-node in its internal tables and stores the file handle.
54
Q

What are the steps in NFS mounting?

What are the steps in opening a remote file?

A

NFS mounting

  • mount program invoked manually by admin
  • input: {remote target | local dir pathname}
    • remote target = {remote server name | dir path name}
    • local dir is where the file will stay in the client’s VFS
  • server sends file handle representing directory
  • kernel construct a v-node for remote directory
  • Client(NFS) creates r-node for remote directory

Opening a remote file

  • Client kernel parses a path name to reach desired directory
  • Find a pointer to r-node in v-node
  • Asks NFS client to open the file
  • NFS clients look at the last name of the path name and gets back a file-handle.
  • NFS client creates an r-node entry for the open file, stores the file-handle in it and the VFS creates a v-node, pointing to the r-node
  • Calling process is given a file descriptor in return, pointing to the v-node.
  • Any subsequent calls this file descriptor will be traced by the VFS to the r-node and the suitable read/write operations will be performed.
55
Q

What is an obsolete value?

What is read-ahead?

A

obsolete value:

A value which is not in use anymore.

read-ahead

Reading data in advance so that it is immedietly available when requested.

56
Q

NFS: what are the performance issues?

A

NFS: performance issues

  • Data sent in 8KB chunks
    • not necesarilly efficient
  • Read-ahead might not be coherent.
  • Client caching is important for efficiecy
  • If the policy is not write-through NFS exposed to problems with coherency and data loss
    • Cached block discarded
      • Data block after 3 seconds
      • Directory block after 30 seconds
    • Every 30 seconds all dirty cached blocks are written.
    • Check with the server whenever a cached file is opened
      • if not in used discard from cache.