File System Flashcards
What is a file? Where are they stored?
A file is a collection of logical data entities (records). They are stored on non-volatile media with relatively short access times at low cost (we will focus on HDDs here).
What are HDD blocks and their relation file records? What is a blocking factor?
Blocks (or sectors) are the smallest addressable units of a disk. They are separated by block gaps. Each block contains a block identifier (its physical location) and check fields for error detection followed by the actual data stored in a block. A block is the elementary (physical) unit of a disk.
The elementary unit of a file is a record which is a logical unit.
Files are stored into HDDs, meaning that the records of files are stored into blocks. If records are of arbitrary length, a flexible assignment of those records into blocks (fixed length). For a constant record length, the ratio of block length to record length is called blocking factor f.
What is a volume label? What are allocation information?
It is placed at well-defined position of a storage area (the area can be a cylinder if the storage area is HDD), created at activation (formatting) time and includes:
- the identifier of the volume
- its capacity
- physical layout
- bad blocks
- link to allocation information (or allocation information itself which is vector or list based bit map that indicates which blocks in the area are free and which ones are allocated).
- link to table of contents (aka the file directory) (or table of contents itself)
Table of contents / file directory: what is it? what is its structure? Describe its content?
The file directory contains the list of the descriptions of all files and is stored on the volume
In its simplest form, it is a one dimensional table. otherwise it is structured directories where a file description can be found as a sub-directory (or more deeper) of a parent directory.
It contains a description of the file: all necessary information concerning the particular file: name, organization, creation date, date of last access (read) and last change (write), size, position of file (or its parts) owner, access rights (defined by owner of file, as to what user group is allowed read, write, execute, delete or change the access rights for the file).
File organization: what is it? what are the main types?
It concerns the internal structure of a file by specifying in which way access to individual records of a file is possible.
- Sequential file organization: Records are accessed sequentially using a pre-defined order (like one after the other). Here we use a pointer that identifies the current record and this pointer can be moved by special operations (next, previous, reset). Reading the file relates to the current position of the pointer while writing is usually only possible at end of file (unless record replacement by same length is possible). In disk devices, files can be stored sequentially in a contiguous (records occupy consecutive blocks on disk) or non-contiguous (we use linked (dedicate a block for direct chaining) or indexed (dedicate block as index block) allocation) manner. This can be costly if we want to access individual records directly.
- Direct file organization: Direct access to a record is organized via a key from which the block address of the record is calculated by applying a hash function to the key value, th. This requires the storage of a hashtable which when files grow arbitrarily, the hashtable would overflow at some point. To fix this, we use extendible hashing which allows an incremental extension of the hashtable without having to completely reorganize it by having a hashtable with buckets and a hash function that leads to a component of a vector of pointers that point to the actual data blocks and holds the amount of records.
- Index-sequential file organization: A mixture of above two. Records are sequentially stored but with additional direct access support. The direct support is via a data structure (index blocks) that holds the largest key of each block that concerns the file and pointer to the start location of the records that concern the file. We provide overflow blocks to store the records that do not fit and insert appropriate reference pointers. The latter can significantly increase access time to individual records so a better idea is to use a data structure like a binary search tree that can support growing and shrinking. The tree only contains actual records in the leaves (which are sequentially linked with pointers) as internal nodes only contain keys that serve for the speed-up of the access.
File Operations: What are they? What is a FCB and how is it used by threads?
Operations on files include: create, open, read, write, reset, lock, close, set parameter, read parameter (access rights as parameters for example), delete.
A file control block (FCB) is responsible for storing a file’s management information such as the position pointer, current block address, references to buffer (in main memory), lock information, etc..
The FCB is created at the time of a file opening and deleted after closing the file. Each thread control block has references to the file control blocks of files that the process has currently opened.
Describe buffering and its relation with files in disk and main memory.
The OS can keep copies of the disk blocks of files in the main memory since data is often accessed more than once. The disk blocks of the file are “buffered” into memory following a specific strategy (also a strategy for replacement).
What are the resolution steps to ensure consistency of the file system in case of a crash?
In traditional FS, a complete check is required after crash where everything is traversed and inconsistencies (in terms of locations of records on disk blocks in b-search trees, index blocks or whatever structure is used for access) are identified to fix problems but data loss is still likely.
Newer approaches try to avoid this complete check by using:
- Journaling/Logging File Systems: Every modification is written as transaction to the journal first and only discarded from journal when modification is finished. After crash, only the journal has to replayed. Low performance because everything has to be written twice.
- Log-structured File systems: Same as above but instead of journaling whole modification as transaction, journal its meta data only and only replay the log post crash to reconstruct file system. Regularly written checkpoints are required to log enough metadata for the replay which means background garbage collection for old no longer required journaled meta data.
- Copy-on-write (COW) file system: No-in-place updates for block writing operations, meaning no inconsistencies. Just update copy of the block into difference address and then update reference. This allows easy realization of snapshots but COW OPs can be costly.