I/O Flashcards
file system components
interface
implementation
mass-storage structure: lowest level of the fs, the secondary and tertriary storage structures
block devices
hard disks, SSDs
store information in fixed-sized blocks
transfers are in units of entire blocks
I/O occurs with random access (by block ID)
character devices
printers, network interfaces
no blocks, only stream of characters/bytes
not addressable, doesn’t have any seek operation
I/O occurs with sequential access
device controller
- Located between actual device and computer
- Offers electronic interface in form of I/O registers
- R/W those registers to ask the controller to perform actions
- e.g. parallel port, usb
port-mapped I/O
two address spaces: main memory + ports
I/O Registers are accessed via dedicated port numbers and special instructions
memory-mapped I/O
one address space
I/O Registers are mapped into addresses of main memory and accessed like memory
hybrid I/O
two address spaces
Both implementation can coexist on a given architecture
I/O requests
Most devices offer status bits in their registers to signal that a request has been finished (and the error code)
OS can poll this status bit (polling)
polling / busy waiting
after outputting a character, the CPU continuously polls the device to see if it is ready to accept another one
interrupts
Device controller can trigger interrupts to signal that an I/O request is complete
=> For the device, this means simply changing the voltage on an electrical interrupt line
- Each controller has interrupt vector assigned
- CPU runs vector-specific handler when int occurs
trap
deliberate action from a program, software interrupt
fault (exception)
usually not deliberate, typically caused by programs
hardware interrupt
device such as printer or network sends a signal to the CPU
precise <-> imprecise interrupts
precise: state is consistent- everything before the interrupt is executed and after isn’t
imprecise: doesn’t have the same guarantee
Data Exchange Between Device and CPU
- Program disk controller reads a sector
- Wait for interrupt
- Read sizeof(sector) bytes from I/O registers
- Repeat for next sector
DMA
hardware does the data transfer
DMA steps
- CPU programs the DMA controller
- DMA requests transfer to memory from disk controller
- data transferred to/from main memory
- ack sent to DMA controller
- interrupt sent to the CPU when done
DMA controller
- On ISA systems there was a dedicated DMA controller (third-party DMA)
- On PCI (and PCIe) systems each PCI device may become “Bus Master” and perform DMA (first-party DMA)
- Device and DMA controller are combined
- problem: You have to trust your Hw
- Embedded systems still have a dedicated DMA controller
- Disk controller still uses own buffers
issues in I/O software
- Device independence
- Uniform naming
- Error handling
- Buffering
- Synchronous vs asynchronous
device independence
- I/O software provides abstraction over actual hardware
- Programs should not have to care about device particularities
uniform naming
- We don’t want to type ST6NM04 to address the first hard disk
- /dev/sda is better
- /mnt/movies even better
error handling
errors should be handled closest to their source
buffers
networking: incoming packets have to be buffered
audio: buffering to avoid clicks
synchronous vs asynchronous
- Programs don’t want to deal with interrupts → OS turns async operations into blocking operations
- Lower levels have to deal with interrupts, DMA, etc.
I/O software layers + their functions
- user processes: make i/o call, format i/o, spooling
- device-independent software: naming, protection, blocking, buffering, allocation
- device drivers: set up device registers, check status
- interrupt handlers: wake up driver when i/o completed
- hardware: perform i/o operation
i/o storage devices
SSD
HDD
SSD vs HDD
DRAM
or NOR/NAND flash memory
no mechanical parts (only controller + memory)
can be used as part of SSHDs
NAND flash memory
organised in cells, pages, blocks
controller can read/program at page granularity
controller can erase at block granularity
RND/SEQ access latency comparable
reads more efficient than writes
wear levelling to improve write endurance
garbage collection to implement write operations
HDD
hard disk drive
multiple platters and cylinders in a single disk with as many tracks as their heads
each track has N sectors
HDD addressing modes
CHS (cylinder-head-sector): virtual/physical
LBA (logical block addressing): virtual
HDD block read/write time factors
seek time: the time to move the arm to the proper cylinder
rotational delay: the time for the sector to come under the head
data transfer time: the time to transfer a single block
Disk Arm Scheduling algorithms aspects
- we want: fast access time and large disk bandwidth
- access time: seek time + rotational latency
- seek time: time for the disk arm to move the heads to the cylinder with the desired sector
- disk bandwidth: total number of bytes transferred / total time between the first request for service and the completion of the last transfer
- total head movement: sum of differences between sectors between which the moves happpened
FCFS disk
- the requests are addressed in the order they arrive in the disk queue
- fair algorithm
- no starvation: every request gets its turn
shortest seek first
can cause starvation: incoming requests can cause a previous one to not get its turn (we are at 15 and 99 waits, but we get 13 then 2, 7 etc.)
elevator algorithm
- scan from one end to the other
- no starvation
- a request that arrives just behind the head it will have to wait for the head to change direction even though it is right next to it
Hard Disk Errors
programming errors: request for nonexistent sector
transient errors: cause by dust on the head
permanent errors: disk block physically damaged
seek errors: the arm was sent to cylinder 6 but i went to 7
controller errors: controller refuses to accept commands
HD bad sector remapping
- A disk track with a bad sector
- Substituting a spare for the bad sector
- Shifting all the sectors to
bypass the bad one
clock hardware types
- Simple clocks deliver hardware interrupts for each voltage cycle of the power supply (i.e., every 20 or 16.7 ms)
- Advanced programmable clocks that have own crystal oscillator that decrements a counter in a register. If counter hits zero → HW interrupt
duties for a clock driver
- Maintaining the time of day
- System boot time (backup clock) + uptime (ticks)
- Preventing processes from running longer than allowed
- Decrement current process’ quantum at each tick
- Accounting for CPU usage
- Increment current process’ cpu time at each tick
- Handling alarm system call from user processes
- Decrement alarm counter at each tick
- Providing watchdog timers for parts of system itself
- Generate sys notifications for synchronous alarms
- Profiling, monitoring, statistics gathering
- Increment specific counters at each tick
magnetic disk physical geometry
Displays the actual structure of a disk with two zones, showing concentric circles representing tracks and numbered sectors on each track
magnetic disk virtual geometry
Represents the simplified view presented to the operating system, where the disk is organized into a different
magnetic disk
- provides the bulk of secondary storage for modern computer systems
- the surfaces are covered with a magnetic material → we store information by recording it magnetically on the platters
- read-write head
- reads and writes are equally fast, which makes them suitable as secondary memory (paging, file systems, etc.)
- Organized into cylinders, tracks, and sectors; modern disks have multiple heads (1 to 16).
- Older disks depended on the controller; modern SATA disks have integrated microcontrollers for optimized performance.
- Features like overlapped seeks and simultaneous operations on multiple drives reduce access times.
disk structure
- large one-dimensional arrays of logical blocks
- logical blocks: the smallest unit of transfer, usually 512 bytes
- the array is mapped onto the sectors of the disk sequentially
- Hard disks consist of stacked platters (3.5-inch or 2.5-inch).
- Initially, disks have no data.
disk formatting
- low-level formatting
- logical formatting
low-level formatting
before a disk can store data, it must be divided into sectors that the disk controller can read and write
disk sector layout
- preamble: starts with a certain bit pattern that allows the hardware to recognize the start of the sector
- data
- ECC: error-correcting code
disk sector
- Requires software to format each platter into concentric tracks with sectors and gaps.
- Sector format includes a preamble, cylinder and sector numbers, a data portion, and an error correction code (ECC)
- Common sector(data) size is 512 bytes
- Spare sectors are reserved for replacing defective ones
logical formatting
- partition the disk into one or more groups of cylinders
- logical formatting: the os stores the ds data structures onto the disk
disk capacity
- Formatted capacity is usually about 20% lower than unformatted capacity due to formatting overhead and spare sectors.
- Confusion exists between advertised (unformatted) and actual (formatted) capacities.
- Different interpretations of terabytes (TB) vs. tebibytes (TiB) add to the confusion.
disk partitioning
- After formatting, disks are partitioned to allow multiple operating systems.
- MBR (Master Boot Record) and GPT (GUID Partition Table) support different disk sizes and partitioning methods.
- The partition table outlines starting sectors and sizes of partitions.
high-level formatting
- Each partition undergoes high-level formatting to establish a file system.
- This includes creating a boot block, free storage administration, and differentiating file systems.
- The system can then boot using the appropriate partition marked as active.
boot process
BIOS reads GPT upon power-on to locate the bootloader for the operating system.
boot block
- the bootstrap program: initialises all aspects of the system = starts the computer
- boot disk or system disk
- most computers store the bootstrap program in the ROM → the bootstrap can’t be changed → put to disk so we can alter it if needed
bootstrap loader program
in ROM, brings full bootstrap program from the disk
bad disk sectors
blocks that are damaged and can’t be used
bad sector handling
manual
sector sparing or forwarding
sector slipping
manual bad sector handling
format finds a bad block → write a special value in the corresponding FAT entry
sector slipping
- the controller maintains a list of bad blocks on the disk
- initialised at low-level formatting
- set aside a spare sector not visible to os
sector sparing or forwarding
- also has spares
- logical block n is defective and first available spare is at m
- remaps all sectors from n to m moving them down one spot: m is copied to the spare, then m-1 into m, and so on until n + 1 is copied into n + 2
- this frees n + 1 so that we keep sequential order
RAID
redundant arrays of independent disks
- a way of storing the same data in different places on multiple hard disks or SSDs to protect data in the case of failure
- higher data transfer rate and higher reliability
- working
- it places data on mutliple disks and allows I/O operations to overlap in a balanced way, thus improving performance
- increases mean time between failures
- disk mirroring: copying identical data to more than one drive
→ expensive
- disk stripping: spread data over multiple disk drives
- improvement of reliability over redundancy
RAID level 0
no mirroring
blocks are split between disks
RAID level 1
mirroring
data is duplicated
improves reading rate
fault tolerance
RAID level 2
stripping not mirroring
disks with data + disks wit parity for each block
parity data used for error detection and correction
RAID level 3
bits are stripped + single parity bit
RAID level 4
block-level stripping + parity per block
RAID level 5
parity after each block ( a1, a2, a3, a_parity, b1, b2, b3, b3, b_parity…)
RAID level 6
like 5 but we use multiple parities → guarding against multiple disk failures
stable storage
- data is never lost
- replicate data on multiple storage devices with independent failure modes
- partial failure: during the write some data is not damaged some is
- total failure: before the write started
- for each logical block must maintain two physical blocks
- when changing them first complete the write to the first one before changing the second one
- during recovery both blocks are examined
- if one is corrupted then replace it with the contents of the other one
- if neither has an error but the contents differ, replace the content of the first block with that of the second