I/O, Secondary Storage, and File Systems Flashcards
indexed allocation
Each file has its own index block, which contains a fixed number of pointers to data blocks
What is the refcount field needed for? When does the file system increment or decrement this field?
The file system needs this field to know when to delete i-node in the presence of hard links. The file system increments the field when a new hard link to inode is created and decrements it when a hard link is deleted.
Modern solid-state drives can access data with high bandwidth and low latency.
Explain how the file system cache works and why such a cache is still desirable for fast drives.
The file system cache buffers all accesses to secondary storage in DRAM. When writing to files, the modifications are written back to storage asynchronously. This is still important for fast SSDs: SSDs have a limited amount of write cycles and wear out over time. The file system cache can buffer many small writes to a single block and only writes the full block to the SSD once applications have stopped writing to that block. It this reduces the wear on the SSD.
Describe one disadvantage of the file system cache.
The write-behind policy might lead to data losses and/or inconsistent state of the FS in case of a system crash.
Modern file systems such as btrfs can transparently replicate files across multiple disks. Btrfs can detect read errors by calculating a checksum over the whole file.
Which RAID level does this approach correspond to?
RAID 1
Some modern hard disks use Shingled Magnetic Recording (SMR) to improve data
density. A write to an SMR track destroys data on neighboring tracks. Consequently, multiple tracks might need a rewrite even when only a few blocks are updated. An application writes data to a file located on an SMR drive. Which existing operating system mechanism can improve performance independent from the file system in use? Describe a situation that shows improvement.
The file system cache can improve performance if the application issues multiple writes close to each other that will end up on neighboring tracks on the SMR drive.
List the three places in the file system, where, according to the lecture, it is possible to encode file types.
- File system structures (e.g., field in i-node)
- Name (e.g., extension)
- Content (e.g., magic number)
Explain the difference between mandatory and advisory file locks.
When a file is locked and a second process requests a lock, the behavior differs:
Mandatory locks cause the second process’s access to be denied. Advisory file locks only tell the second process that the file is locked, and the process can decide for itself whether to continue the access to the file.
Sectors are usually 512 bytes large, whereas many file systems use blocks of significantly larger size (e.g., 4 KiB in FAT32). Give an advantage and a disadvantage of such larger blocks of fixed size compared to smaller blocks.
+ Fewer fragments result in fewer disk seeks.
+ Fewer blocks result in smaller tables required to manage free space.
+ Larger fragments result in more efficient disk operations.
- Larger blocks result in more internal fragmentation.
- Larger pages can result in larger transfers than necessary for small read/write
operations.
How can the specified file system be changed to increase the maximum file size, if the total size of the i-node must not be changed?
- Increase the block size.
- Increase the degree of indirection (e.g., by replacing one of the entries with an entry for triple-indirect block addressing).
- Reduce the size of block addresses to increase the number of references per indirect block.
Describe a situation, where a write operation into a file requires the OS to first read existing file contents of the same file from the block device.
If data is appended or replaced in the middle of a disk block, the existing contents of the block have to be read from the disk: Because disks are block devices, the whole updated block (combination of old and new content) has to be written back to the disk.
Why do some file read operations trigger no access to the underlying block device?
Most operating systems maintain a file system cache (buffer cache, page cache) which caches the content of the underlying block device. If operations hit the cache, they usually do not cause any block device access.
When writing a single block in a RAID 3, do all disks in the array need to be accessed?
Yes, due to byte interleaving: Each block is spread over all disks in the RAID, so all disks hold a part of the requested block.
The relative path ../../asdf/./jkl is accessed from within the directory /a/b/c/. Give the absolute path, without any unnecessary elements.
/a/asdf/jkl
Explain the difference between shared file locks and exclusive file locks.
Exclusive file locks can only be acquired by one reader.
Shared file locks can be acquired by multiple readers (or one writer).
On conventional hard disks, what impact on performance can placing very commonly required data (e.g., inode tables) on the center cylinders of the disk have (as opposed to placing the data on inner or outer cylinders)?
The average seek time is reduced: The seek time depends on the distance traveled by the head. This distance is short because the center cylinders are
on average closer to all other cylinders of the disk than e.g. the inner or outer cylinders.
What is spooling?
Spooling means that the system holds back output for a device while that device is busy executing another request. Spooling is necessary for devices which can only serve one request at a time (e.g., a printer).
Which problem can be caused by DMA during page replacement, and how can this situation be prevented?
DMA operates on physical addresses, so the DMA controller will happily write into frames even after the corresponding page table entries have been invalidated.
This situation can be prevented by pinning all DMA targets into physical memory.
A program appends data to file, which has multiple hard links. Why is it advantageous to store attributes like the file size not in the directory entry, but instead in the inode?
If the file size was stored in the directory entry, then all directory entries of all hard links would have to be visited and modified, which is more expensive than changing the single inode of the file.
For each of the three allocation strategies contiguous allocation, chained allocation, and indexed allocation, describe a scenario for which the strategy is particularly well suited.
Contiguous Allocation: Suited if data is only written once (e.g., when creating read-only media such as DVDs). Strategy is prone to fragmentation and cannot cope well with changing file sizes. Files should therefore be static (i.e., read-only).
Chained Allocation: Suitable if data is only linearly accessed (e.g., for video or audio files). Random access is very slow because the reader must walk
the chain to find a certain offset in the file.
Indexed Allocation: Suitable whenever good random access performance is required. The index blocks allow very fast mapping of file offsets to blocks on disk.
You own a folder containing secret data. You should have read and write access to the files inside this folder, while members of your user group should have only read access to those files. Everyone else should have no access.
On a UNIX file system, which access rights do you have to set for files and directories inside this directory?
Files: 640 or rw-r—–
Folders: 750 or rwxr-x—
How is it possible to give a single user outside your user group access to the directory without violating the requirements described above and without changing the user’s group?
ACLs can be used to grant access only to the single user.
Is it possible to create a soft link to F inside the directory A?
Yes. Soft links only point to a path name. Therefore, it does not matter where the target is stored (or if it even exists).
How does a RAID 1 consisting of two identical hard disks change the bandwidth and latency of read and write accesses compared to using a single disk?
Write: Bandwidth is unchanged since all data must be written to both disks. Latency is also unchanged since the same seeks take place on both disks.
Read: Bandwidth is doubled: Since all data exists on both disks, the two can read different blocks in parallel. Latency is unchanged since one of the
disks must still perform a seek.