Persistent Storage: Intro to File Systems Flashcards
What does I/O stand for?
Input/Output
Why are input and output required for computer systems to be interesting?
Many I/O devices allow computer systems to interact with the world around them in more dynamic ways.
What are buses in I/O system architecture?
Data paths that allow information to be exchanged between the CPU, RAM, and I/O devices.
List some typical I/O devices in a modern computer system.
●Graphics
●Network
●eSATA Disk
●USB Keyboard
●USB Mouse
●NVMe Drive
What is the Canonical Device Interface?
An interface made of 3 registers that allows the OS to communicate with I/O devices. The three registers are:
●Status - Indicates the current status of the device.
●Command - Allows the OS to tell the device what command to perform.
●Data - Used to send/receive data to/from the device.
What is polling?
A method the OS can use to communicate with a device where the OS repeatedly checks the STATUS register of the device in a loop until the device is ready.
Advantages:
●Simple
●Works
Disadvantages:
●Wasted CPU cycles as the CPU is constantly checking the status register even when the device is not ready.
What are interrupts?
A method used by the OS to handle I/O requests by putting the process requesting I/O to sleep and context switching to a different process. When the I/O request finishes, the device alerts the OS with an interrupt. The CPU jumps to an Interrupt Handler in the OS.
Advantages:
● No wasted CPU cycles as the CPU can work on other tasks while waiting for the I/O to finish.
Disadvantages:
● Context switching can be expensive.
● Polling can be better for fast devices.
What are the two small simplifications made when discussing file system implementation?
1.
User Read() allows an arbitrary number of bytes: This is simplified to only allowing Read() of a block.
2.
Block size is typically 2^n * sector size: This is simplified to block size = sector size.
What are the 4 steps of disk access?
1.Head selection: Select the correct platter.
2.Seek: Move the arm over the correct cylinder.
3.Rotational latency: Wait for the head to move over the correct sector.
4.Transfer time: Read the data from the sector.
What are the three main components of disk access time, and approximately how long does each one take?
●Seek time: 3-12 milliseconds
●Rotational latency: 2-7 milliseconds
●Transfer Time: microseconds
Note: Disk access time is much greater than memory access time. Seek time dominates disk access time.
What is a file system cache?
A cache used to keep recently accessed blocks in memory. Also referred to as a buffer cache.
Benefits of using a file system cache:
● Reduce latency as the data can be retrieved from memory instead of disk.
● Reduce disk load as the number of disk accesses is reduced.
What are the two ways to write data to disk when using a cache?
● Write-through: Data is written to the cache and the disk at the same time. The user process is returned to only after the data is written to both the cache and disk.
● Write-behind: Data is written to the cache first. The user process is returned to after the data is written to the cache. The data is written to disk at a later time.
What are the advantages and disadvantages of write-through and write-behind?
Write-through
● Advantage: Better in the event of a crash as there is no “window of vulnerability.”
● Disadvantage: Worse response time and disk load.
Write-behind
● Advantage: Better response time and disk load. Much of the data is overwritten in the cache before it is written to the disk, improving efficiency.
● Disadvantage: Worse in the event of a crash as there is a “window of vulnerability” while the data is in the cache but not on the disk.
In practice: Write-behind is typically used, with periodic cache flushes to disk. There is typically a user primitive available to manually flush the data to the disk
What is read-ahead?
method of optimizing disk access by reading ahead in a file. When a user requests block i of a file, block i+1 is also read from disk. This is also referred to as prefetching. It only works for sequential access to files.
Benefits of using read-ahead:
● There is no disk I/O when (or it is expected that) the user accesses block i+1.
Drawbacks of read-ahead:
●Does not reduce the number of disk I/Os overall.
●Could increase the number of disk I/Os if the user access is not sequential.
In practice: Read-ahead is very often a win. Linux always reads one block ahead.
What are the two main approaches to minimizing seeks when optimizing disk access?
●Clever disk allocation: Locate related data on the same cylinder.
●Clever scheduling: Reorder requests to seek as little as possible.
What is the idea behind clever disk allocation?
Locate related data (same file) on the same cylinder. This is done by allocating related blocks together.
Related blocks are consecutive blocks in the same file. This is intended to improve the performance of sequential access to files.
Allocating blocks “together” means allocating them on the same cylinder or a nearby cylinder.
What is the idea behind disk scheduling?
Reorder disk requests to seek as little as possible.
Disk Scheduling policies:
●FCFS – First-Come-First-Served
●SSTF – Shortest-Seek-Time-First
●SCAN
●C-SCAN
●LOOK
●C-LOOK
What is the FCFS disk scheduling policy?
First-Come-First-Served
● Serves the next request in the queue.
What is the SSTF disk scheduling policy?
Shortest-Seek-Time-First
● Pick the “nearest” request in the queue, where “nearest” is determined by the request being closest to the current head position.
Advantages:
● Very good seek times
Disadvantages:
● Subject to starvation. Requests on the inside or outside can get starved.
What is the C-SCAN disk scheduling policy?
●Similar to SCAN.
●Always move head in the same direction.
●From MAX_CYL to 0, pick up requests as the head moves down.
●From 0 to MAX_CYL, no requests are served.
C-SCAN can also be implemented in reverse:
●From MAX_CYL to 0, no requests are served.
●From 0 to MAX_CYL, pick up requests as the head moves up.
Advantages:
●More uniform wait time than SCAN
Disadvantages:
●Number of cylinders traveled is slightly higher than SCAN
What is the C-LOOK disk scheduling policy?
●Similar to C-SCAN
●Always move head in the same direction.
●From MAX_CYL_IN_QUEUE to MIN_CYL_IN_QUEUE, serve requests as the head moves down.
●From MIN_CYL_IN_QUEUE to MAX_CYL_IN_QUEUE, no requests are served.
C-LOOK can also be implemented in reverse:
●From MAX_CYL_IN_QUEUE to MIN_CYL_IN_QUEUE, no requests are served.
●From MIN_CYL_IN_QUEUE to MAX_CYL_IN_QUEUE, serve requests as the head moves up.
In practice: Some variation of C-LOOK is typically used.
How do you avoid rotational latency when optimizing disk access?
●Use clever disk allocation to locate consecutive blocks of a file on consecutive sectors within a cylinder.
In terms of disk optimization, when do clever disk allocation and disk scheduling work well?
●Clever allocation works well under low load.
●Disk scheduling works well under high load.
Why?
●Under high load: There are many requests in the queue, giving more opportunities for scheduling to optimize disk access. Clever allocation can be defeated by interleaved requests for different files.
●Under low load: There is less opportunity for scheduling optimization, as there are not as many requests in the queue. If user access is sequential, disk access tends to also be sequential. The cache also tends to reduce load.