Filesystems Flashcards
Compare the linked allocation policy with an indexed allocation policy.
Describe advantages and disadvantages of each, and give an example of a
situation where you would use each.
No idea when I’d use a linked allocation. It has no fragmentation, which is cool, but it suffers from no direct access as blocks are a linked list up to that block. Slow access times, lots of head movement on spinning disks and seeking on solid state. Clustering can solve some of these problems.
FAT (an indexed allocation policy) creates an index table at the beginning of the disk. This disk stores indexes of blocks. The root block, maybe ‘/’ in a unix system, is in a known location, so when you look up that block number you can build the list of files by looking at the ‘next block’ column in the table (which points to an index of the same table). Suffers from internal fragmentation, as index doesn’t fill entire allocated block when file is created. Sometimes index gets full, so we have to create a linked list of index blocks. Can also adapt easily to a hybrid scheme, like Unix uses.
Overall, Linked only makes sense on very small filesystems where the overhead of a FAT would be an issue and seek times are not important.
Explain how journaling can help to ensure data integrity in a file system.
Describe a scenario when journaling works to ensure data integrity, and a
scenario where data will still likely be lost.
Journaling writes changes to the filesystem to a transaction log on disk before committing them. When flushed, the fs writes all the changes in order to the actual locations on disk. Consistency - either all operations are applied or none are.
If there is a filesystem crash, we have two possibilities:
Transactions that haven’t been started can be applied.
Transactions that have been partially applied need to be undone and their data will be lost.
Discuss the acyclic graph structure common in many file systems. In your
answer:
• Define the term ‘hard link’.
• Explain how the acyclic property of the graph can be guaranteed.
• Describe when and how storage space for data and meta-data is
reclaimed.
• Explain the purpose and implementation of symbolic links.
An acyclic graph enforces no inode can directly reference it’s parent directory, or any directory, with a hard link.
A hard link is a dirent that points to an underlying inode.
Acyclic properties are guaranteed by …
Storage and metadata space is reclaimed (in an acyclic structure) when the number of hard links to the underlying inode drop to 0.
Symbolic links exist to keep the convenience of hard links but without the actual hassle. They are often just text files containing the path of the file / directory they point to.
As an addendum, let’s explain why cyclic structures are bad. They require loads of extra garbage collection in order to notice when links have been removed ie a file has been deleted. eg. Floyd’s algorithm
Define the role of a file system.
To manage the storage and retrieval of files from a disk, manage their metadata and abstract from the complexity of dealing with storing bytes yourself.
Explain the role of reference counting with respect to the implementation of an
i-node based file system. In your answer, describe the limitation that reference
counting imposes on the implementation.
Reference counting is used to determine when an inode is no longer being pointed to by anything. This means its link count has dropped to 0. The directory structure must be acyclic for this, otherwise a variety of expensive garbage collection techniques are required in order to reference count.
Explain why a file system’s data structures can become corrupt (inconsistent).
Discuss the form that such corruption can take and describe a mechanism to
address corruption.
If data is written to the cached data structures defining the file system before they are actually applied to the disk, then attempts to retrieve that data may fail, as the metadata and storage are inconsistent.
Journaling is a method of preventing this. A transaction log is stored on the disk, and pending transactions are written there. In the event of some kind of interruption, like a crash or power failure, the transaction log can be applied back to the filesystem - with the exception of partially executed transactions for which the data may be lost.
For each of the following RAID structures, describe how data is arranged and
explain the effect on capacity, performance and reliability:
RAID0
RAID1
RAID5
RAID0 is basic block level striping. File is written and read at double the speed as it is written to at least two disks at once in at least two halves. Loss of one disk means loss of all of your data. (or about half of every file)
RAID1 is basic mirroring. File is written at same speed, so no performance boost - but the loss of one drive means there is another copy of your file on the other n drives in the array.
RAID 5 is block interleaved parity. Parity blocks are stored across multiple disks to improve performance of multiple reads.
For completeness:
RAID 2 is a (4,3) hamming code style memory layout ECC structure with partity stored across disks
RAID 3 is a bit interleaved ECC correcting structure with parity stored on one disk
RAID 4 is a block interleaved parity structure with parity stored on one disk
RAID 5 is a block interleaved parity structure with parity blocks stored across all disks
RAID 6 (dual redundancy parity) is like 5 but with two parity calculations per block. Can handle 2 disc failures.
Describe the structure and purpose of inodes in UNIX file systems.
Inodes, or File Control Blocks, contain file information along with pointers to blocks. In UNIX, they contain permissions, access, modified and created times along with inode mode.
They represent a file or directory in a filesystem.
A hard disk has the following queue of disk cylinders to be accessed:
[140, 55, 29, 5, 98, 49, 156, 59]
Assume a disk cylinder range of 0-199, with the head starting at 50.
Give the order of access under both a Shortest Seek Time First (SSTF) schedule
and under a LOOK schedule (assume the disk head is currently moving up).
Which method provides the least disk head movement?
SSTF: [49, 55, 59, 29, 5, 98, 140, 156]
LOOK: [55, 59, 98, 140, 156, 5, 29, 49]
Probably LOOK? Need to do more working out to prove.
Explain how deduplication allows a file system to save storage space when
storing similar files. Discuss the operations necessary on file creation, file
modification and file deletion.
Deduplication removes copies of the same chunk of data, replacing one with pointers. Can be done either at the block or file level.
Creation requires a hash of the file to be done so that we can compare against all exisiting hashes for matches. If we find one, replace the new one’s data with a pointer to the old ones data.
Modification requires a rehashing of the file, same as delation. If the hash matches another file already exisiting, do creation process. Otherwise transfer ownership of the pre-modification data to another deduplicated version.
Deletion just make sure that a file isn’t being pointed to by deduplication versions. If it is,, then give another file ownership of the data and update pointers to them.
Describe the structure and role of inodes in Unix file systems.
Inodes represent files and directories in UNIX. They have access, created … permissions, mode, size etc They also contain pointers to blocks in a hybrid indirection scheme
Calculate the total amount of disk space taken up by a 53kB file in an inode-
based file system. Include storage of data and meta-data in your
calculation. Assume that data blocks are 4kB and be careful to describe all
of your assumptions.
FCB requires 4KB of space on disk. To the next 4KB 53 -> 56KB so 56KB of allocated disk space will be used.
This is all assuming a 4K aligned disk, and that the filesystem doesn’t have special allowance for multiple small files to be placed in a single block.
Explain the pros and cons of a linked allocation scheme for data-blocks and
describe the motivation for the following two optimisation techniques:
a FAT
clustering
Linked allocation has zero external fragmentation due to the natural fragmentation of the scheme. File expansion is very simple. File access is hard, as all direct accesses must be done sequentially.
FAT allows direct access as blocks can be calculated based on their index number
Clustering can allow linked allocations to keep head movement down by storing linked blocks close together on disk. Requires defragmenting.
Present an argument for the number of disk reads necessary to execute a
‘stat’ system call on the file ‘/usr/local/jdk1.8.0/jre/java’. Assume the
following:
The directory /usr/local/ has recently been accessed
An inode-based file system where directories are implemented as files
if /usr/local has been accessed recently, it is probably cached in memory with a pointer to it’s FCB on disk. That FCB (once read from disk) will contain -> jdk1.8.0 -> jre -> java. So we’re at 1 + 3 * 2 disk reads to get the java FCB.
Discuss the acyclic graph structure common in many file systems. In your
answer:
Define the term ‘hard link’
Explain how the acyclic property of the graph can be guaranteed
Describe when and how storage space for data and meta-data is
reclaimed
Explain the purpose and implementation of symbolic links
Hard links are references directly to other inodes in the file system.
Acyclic property can be guaranteed by ensuring no hard links are made to directories.
When all hard links to an inode are deleted, the storage space is reclaimed.
Symbolic links give the convenience of a link to another part of the filesystem without the headache of hard links. They are usually just a text file with the link as a path name.