I/O, Secondary Storage, and File Systems Flashcards by Yd Yd

indexed allocation

Each file has its own index block, which contains a fixed number of pointers to data blocks

How well did you know this?

Not at all

Perfectly

What is the refcount field needed for? When does the file system increment or decrement this field?

The file system needs this field to know when to delete i-node in the presence of hard links. The file system increments the field when a new hard link to inode is created and decrements it when a hard link is deleted.

How well did you know this?

Not at all

Perfectly

Modern solid-state drives can access data with high bandwidth and low latency.
Explain how the file system cache works and why such a cache is still desirable for fast drives.

The file system cache buffers all accesses to secondary storage in DRAM. When writing to files, the modifications are written back to storage asynchronously. This is still important for fast SSDs: SSDs have a limited amount of write cycles and wear out over time. The file system cache can buffer many small writes to a single block and only writes the full block to the SSD once applications have stopped writing to that block. It this reduces the wear on the SSD.

How well did you know this?

Not at all

Perfectly

Describe one disadvantage of the file system cache.

The write-behind policy might lead to data losses and/or inconsistent state of the FS in case of a system crash.

How well did you know this?

Not at all

Perfectly

Modern file systems such as btrfs can transparently replicate files across multiple disks. Btrfs can detect read errors by calculating a checksum over the whole file.
Which RAID level does this approach correspond to?

RAID 1

How well did you know this?

Not at all

Perfectly

Some modern hard disks use Shingled Magnetic Recording (SMR) to improve data
density. A write to an SMR track destroys data on neighboring tracks. Consequently, multiple tracks might need a rewrite even when only a few blocks are updated. An application writes data to a file located on an SMR drive. Which existing operating system mechanism can improve performance independent from the file system in use? Describe a situation that shows improvement.

The file system cache can improve performance if the application issues multiple writes close to each other that will end up on neighboring tracks on the SMR drive.

How well did you know this?

Not at all

Perfectly

List the three places in the file system, where, according to the lecture, it is possible to encode file types.

File system structures (e.g., field in i-node)
Name (e.g., extension)
Content (e.g., magic number)

How well did you know this?

Not at all

Perfectly

Explain the difference between mandatory and advisory file locks.

When a file is locked and a second process requests a lock, the behavior differs:
Mandatory locks cause the second process’s access to be denied. Advisory file locks only tell the second process that the file is locked, and the process can decide for itself whether to continue the access to the file.

How well did you know this?

Not at all

Perfectly

Sectors are usually 512 bytes large, whereas many file systems use blocks of significantly larger size (e.g., 4 KiB in FAT32). Give an advantage and a disadvantage of such larger blocks of fixed size compared to smaller blocks.

+ Fewer fragments result in fewer disk seeks.
+ Fewer blocks result in smaller tables required to manage free space.
+ Larger fragments result in more efficient disk operations.
- Larger blocks result in more internal fragmentation.
- Larger pages can result in larger transfers than necessary for small read/write
operations.

How well did you know this?

Not at all

Perfectly

How can the specified file system be changed to increase the maximum file size, if the total size of the i-node must not be changed?

Increase the block size.
Increase the degree of indirection (e.g., by replacing one of the entries with an entry for triple-indirect block addressing).
Reduce the size of block addresses to increase the number of references per indirect block.

How well did you know this?

Not at all

Perfectly

Describe a situation, where a write operation into a file requires the OS to first read existing file contents of the same file from the block device.

If data is appended or replaced in the middle of a disk block, the existing contents of the block have to be read from the disk: Because disks are block devices, the whole updated block (combination of old and new content) has to be written back to the disk.

How well did you know this?

Not at all

Perfectly

Why do some file read operations trigger no access to the underlying block device?

Most operating systems maintain a file system cache (buffer cache, page cache) which caches the content of the underlying block device. If operations hit the cache, they usually do not cause any block device access.

How well did you know this?

Not at all

Perfectly

When writing a single block in a RAID 3, do all disks in the array need to be accessed?

Yes, due to byte interleaving: Each block is spread over all disks in the RAID, so all disks hold a part of the requested block.

How well did you know this?

Not at all

Perfectly

The relative path ../../asdf/./jkl is accessed from within the directory /a/b/c/. Give the absolute path, without any unnecessary elements.

/a/asdf/jkl

How well did you know this?

Not at all

Perfectly

Explain the difference between shared file locks and exclusive file locks.

Exclusive file locks can only be acquired by one reader.
Shared file locks can be acquired by multiple readers (or one writer).

How well did you know this?

Not at all

Perfectly

On conventional hard disks, what impact on performance can placing very commonly required data (e.g., inode tables) on the center cylinders of the disk have (as opposed to placing the data on inner or outer cylinders)?

The average seek time is reduced: The seek time depends on the distance traveled by the head. This distance is short because the center cylinders are
on average closer to all other cylinders of the disk than e.g. the inner or outer cylinders.

How well did you know this?

Not at all

Perfectly

What is spooling?

Spooling means that the system holds back output for a device while that device is busy executing another request. Spooling is necessary for devices which can only serve one request at a time (e.g., a printer).

Which problem can be caused by DMA during page replacement, and how can this situation be prevented?

DMA operates on physical addresses, so the DMA controller will happily write into frames even after the corresponding page table entries have been invalidated.
This situation can be prevented by pinning all DMA targets into physical memory.

A program appends data to file, which has multiple hard links. Why is it advantageous to store attributes like the file size not in the directory entry, but instead in the inode?

If the file size was stored in the directory entry, then all directory entries of all hard links would have to be visited and modified, which is more expensive than changing the single inode of the file.

For each of the three allocation strategies contiguous allocation, chained allocation, and indexed allocation, describe a scenario for which the strategy is particularly well suited.

Contiguous Allocation: Suited if data is only written once (e.g., when creating read-only media such as DVDs). Strategy is prone to fragmentation and cannot cope well with changing file sizes. Files should therefore be static (i.e., read-only).

Chained Allocation: Suitable if data is only linearly accessed (e.g., for video or audio files). Random access is very slow because the reader must walk
the chain to find a certain offset in the file.

Indexed Allocation: Suitable whenever good random access performance is required. The index blocks allow very fast mapping of file offsets to blocks on disk.

You own a folder containing secret data. You should have read and write access to the files inside this folder, while members of your user group should have only read access to those files. Everyone else should have no access.
On a UNIX file system, which access rights do you have to set for files and directories inside this directory?

Files: 640 or rw-r—–
Folders: 750 or rwxr-x—

How is it possible to give a single user outside your user group access to the directory without violating the requirements described above and without changing the user’s group?

ACLs can be used to grant access only to the single user.

Is it possible to create a soft link to F inside the directory A?

Yes. Soft links only point to a path name. Therefore, it does not matter where the target is stored (or if it even exists).

How does a RAID 1 consisting of two identical hard disks change the bandwidth and latency of read and write accesses compared to using a single disk?

Write: Bandwidth is unchanged since all data must be written to both disks. Latency is also unchanged since the same seeks take place on both disks.
Read: Bandwidth is doubled: Since all data exists on both disks, the two can read different blocks in parallel. Latency is unchanged since one of the
disks must still perform a seek.

Name three advantages of hard disks.

* cost efficient / cheaper than SSDs * endurable: number of reads and writes is nearly infinite * reliable: SSDs have more uncorrectable data errors * large: up to 20 TB per device, more than SSDs can provide * simpler: no need for complex controller logic for FTL

Name two advantages of using extents.

* improves contiguity * reduces index size * reduces overhead from unneeded pointers

What is the purpose of an fsck program?

A file system checker verifies invariants of the file system metadata. It can fix certain types of metadata corruption and can thus prevent propagation of these issues while the file system is in use.

Name a block allocation policy and rate its suitability for random file access

Contiguous allocation Random access is very efficient. Obtain start block from FAT, check offset against file length, calculate target block from start block and offset, read target block. Chained allocation Random access is very nefficient. Obtain start block from FAT, repeatedly read data block from disk and follow pointer to the next block until the target block is reached. Linked List Allocation/FAT Random access is slightly more efficient than with chained allocation because the block chain is stored in RAM (otherwise, similar description of steps). Indexed Allocation Random access is slightly less efficient than with contiguous allocation. Read inode for file, follow pointers to the correct indirect block for the requested offset, read target block.

Which operating system component allows mounting multiple file systems into a shared directory tree?

The Virtual File System (VFS).

Which system call flushes all dirty blocks in the file system cache to the disk?

one of sync(), fsync(), msync()

Describe an advantage of log-structured file systems (or copy-on-write file systems) over journaling file systems.

* Journaling file systems need to write all data twice: once to the journal and a second time to the actual data blocks. A copy-on-write file system can provide the same crash consistency guarantees without the second data write. * Copy-on-write file systems can provide complex features such as snapshots or block-level data deduplication with little overhead compared to journaling file systems. * Log-structured file systems write almost everything sequentially, which is an advantage on certain storage media (e.g., (SMR) hard disks).

How is it possible to give a single user outside your user group access to a directory without changing the user’s group?

ACLs (1 P) can be used to grant access to only a single user

When a symbolic link is deleted in a Unix file system, the inode of the linked file does not need to be modified. This is not true for hard links. Why is this the case?

A hard link is a reference to an inode. (0.5 P) Symbolic links on the other hand are special files that refer to another file by its path, which may not exist. (0.5 P) If there is no more hard link pointing to an inode, the inode needs to be deleted. Thus, we need to keep track of the number of hard links with a reference count. (

An array of four hard disks can be configured for different RAID levels. How many disks can fail in each configuration before data is permanently lost? Explain your answer

RAID 0: 0 (0.5 P), striping, no redundancy (0.5 P). RAID 10: 1 or 2 (0.5 P), can handle two failing drives if they are in separate RAID 1 groups (0.5 P). RAID 4: 1 (0.5 P), data is stored on three disks, parity information is stored on a separate disk, thus reconstruction is only possible if one disk fails (0.5 P). RAID 5: 1 (0.5 P), same as RAID 4, but parity information is distributed across all disks (0.5 P).

The inode link count is 1, it should be 2. The free-bitmap entry for block 1234 is 0 (free), it should be 1 (allocated)

A second directory entry has been found to an inode with link count 1 An inode entry points to a block which is not marked to be allocated.

Give two examples for file types that are commonly encoded in the inode.

regular file, directory, symbolic link, character/block device

How does a Unix system detect a shell script in an executable file?

A shell script starts with a shebang #!

What disadvantage does the RAID-4 system have?

The RAID-4 systems does not allow parallel write accesses (0.5 P) because the parity disk forms a bottleneck

What issue does RAID 4 have? How does RAID 5 solve this issue?

The parity disk is accessed for every write on any disk and thus tends to fail quickly (0.5 P). With RAID 5, the parity is distributed across all disks (0.5 P).

Describe RAID 4. How is the data distributed among the disks? What type of redundancy is used, and where is it stored?

RAID 4 has multiple data disks, and uses block-level (0.5 P) striping (0.5 P) of the data. A separate disk (0.5 P) holds the parity (0.5 P) of the data disks.

Why can RAID 4 cause premature failure of one of the hard disks?

The parity disk is accessed on every write access (0.5 P). It is therefore accessed more often than the data disks, leading to increased wear (0.5 P).

Why is the amount of flash memory in a SSD usually larger than the capacity available to the OS?

The additional memory is used as spare blocks (0.5 P). Modified block contents are written to those spare blocks