File System Implementation (14) Flashcards
FS Structure
- File structure
- Logical storage unit
- Collection of information that are related
- File systems resides on disks
- Provided user interface to storage, that maps logical to physical addr
- Provides efficient and cheap access to disk: allows data to be stored, located and retrieved
- Disk provides in-place rewrite and random-access
- I/O transfers perfomed in blocks of sectors (usually 512bytes)
- FCB (File control block): structure where all file info are contained
- Device driver: controls the physical dev
- The file system is layered.
Layers of the File system
Device drivers manage I/O devices at the I/O control layer
- Given commandsl ike“read drive1,cylinder72,track2,sector 10, into memory location 1060” outputs low-level hardware specific commands to hardware controller
Basic file system given command like “retrieve block 123” translates to device driver
- Also manages memory buffers and caches (allocation, freeing, replacement)
- Buffers hold data in transit
- Caches hold frequently used data
File organization module understands files, logical address, and physical blocks
- Translates logical block # to physical block #
- Manages free space and disk allocation
Logical File system manages metadata info
- converts file names into numbers, file handle, location by mantaining file control blocks
- directory management
- protection
Layering is useful for reducing complexity and redundancy, but adds overhead and can decrease performance.
Operating Systems support more than one file system at the time.

File Control Block info
- FCB that describes a file contains many details:
- I node number, permission, size, dates
- NTFS stores the FCB into the master file table in a similar fashion to relational db

File System Operations
Boot control block
- contains info needed by system to boot OS from that volume
- Needed if volume contains OS, usually first block of volume
Volume control block
- contains volume details
- Total # blocks, # of free blocks, block size, free block pointers or array
Direcotry structures:
- Organizes the files: names, inode numbers, master file table
In-memory file system structures
- Mount table storing file system mounts, mount points, file system types
- system-wide open-file table contains
- a copy of the FCB of each file
- synchronization primitives
- ref. count
- block # and offset in the block
- per-process open-file table contains pointers to appropriate entries in system-wide open-file table as well
- access method
- pointer to file
- take a look at pic
- More processes may open the same file, on system-wide table there is only one entry per file.
- Open return a file handle
- Buffers hold data blocks that come from disk
- Data read is eventually copied into a specif user process memory

Directory Implementations
Linear List of file names
- Simple to code
- Time-consumeing to execute O(n)
- use B+-Trees?
Hash Table
- O(1) search time
- Collissions must be dealt with
- Good if entries are fixed size
- or we should use chained overflow method
Allocations Methods: Contiguos
Contiguous allocation – each file occupies set of contiguous blocks
- Usually best performance
- Simple
- only starting location (block#) and length(number of blocks) are required
- Problems:
- finding space for file
- knowing file size
- external fragmentation
- need for compaction off-line (downtime) or on-line
- Compaction is the process of grouping the free space in a specifi part of the disk

Allocation methods: Extent-Based Systems
Many newer file systems (i.e., Veritas File System) use a modified contiguous allocation scheme.
An extent is a contiguous block of disks
- Extents are allocated for file allocation
- A file consists of one or more extents
Contiguos chunk is allocated initially, if the amount is not large enough, another chunk of contiguos space is added.
The location of a file’s block is recorded, as well as block count and a link to the first block of the next extent.
Allocation Methods: Linked
Pros and Cons
Linked allocation: each file a linked list of blocks
- File ends at nil pointer
- Good:
- No external fragmentation: the are no small unused small memory blocks
- No compaction: no need to move objects around
- Free space management system called when new block is needed
- Improve efficiency by clustering blocks into groups but increasse internal frag.
- Bad
- Locating a file may take many I/Os and disk seeks
- Reliability can be a problem
FAT (File Allocation Table)
Linked allocation with some features;
- beginning volume has table indexed by block number
- similar to linked list, but faster on disk and cacheable
- new block allocation simple
- provided that the FAT is reliable, if we loose one block, we have the rest of the file
- this is a limite of non-FAT linked allocation methods

Linked Allocation
- Each file is a linked list of blocks
- Mapping
- If we consider that a pointer has a size of one word
- for each block we have 511 words for data and 1 for the pointer
- we need to divide the size by 511 to get the required number of blocks
- LA/511
- Q is the logical address
- Displacement into block is R + 1
- Physical and logical cannot be mapped directly there needs to be an addition
- If we consider that a pointer has a size of one word

Allocation Methods: Indexed
- Each file has its own index block(s) containing pointers to data blocks
- Index table is needed
- random access
- dynamic access without external fragmentation, but have overhead of index block
- Mapping from logical to physical in a file of maximum size of 256K bytes and block size of 512bytes. We need only 1 block for index table
- LA/512
- Q = displacement into index tbale
- R = displacement into block
- LA/512

Allocation Methods: Indexed
Unbounded Length File
Linked Scheme:
mapping from logical to physical in a file of unbounded length (block size of 512 words)
- LA/(512x511) =
- Q1: displacement of index table
- R1/(512) =
- Q2: displacement into block index table
- R2: displacement into block of file
Two levels:
4K blocks can store 1024 four-byte pointers in outer index, that is 1M data blocks and up to 4GB of files.
As before but we don’t have the pointer displacement in the block:
- LA/(512x512) =
- Q1: displacement into outer-index table
- R1/(512) =
- Q2: displacement into block index table
- R2: displacement into block of file

Allocation: Combined Scheme: UNIX UFS
4K bytes per block, 32-bit addresses
More index blocks than can be address with 32-bit file pointer.
The inode is a sort of multi-level index table, that allows small files to be stored in the first few blocks, but if the file is larger, the single/double/triple index are used.

Performance of Access Methods
- Best method depends on what we are going to do
- contiguos is great for sequential and random
- Linked good for sequential, not random
- Declare access type at creation
- Index more complex
- Single block access could require 2 index block reads then data block read
- clustering can help improve throughput, reduce cpu overhead
- For nvm, no disk head so different algorithms and optimizations needed
- old algo uses many cpu cycles trying to avoid non-existend head movement
- with nvm goal is to reduce cpu cycles and overall path needed for I/O’
- Doing other tasks during the I/O operations must be done, otherwise the CPU will be useless for a lot of time
Free Space Management:
- Free space list
- list of free blocks linked
- cannot gat contiguos space easily
- no space waste, because we can insert the information about wheter the next block is free inside the metdata of the block.
- no need to traverse to find a free node
- Bit vector or bit map
- bit map requires extra space
- easy to get contiguos file
- Example
- Block size : 4KB = 2^12
- Disk size: 1TB = 2^40
- n = 2^40/2^12 = 2^28 bits (32MB)
- We could group blocks of 4, b[i] (are blocks i * 4, i *4 + 1, i * 4 + 2, i * 4 + 3 free ?), and the dimension would be 8MB
Grouping
- Linked list stores address of next n-1 free blocks in first free block, plus a pointer to next block that contains another group of free-block pointers.
and counting:
- because space is frequently used and freed contiguosly
- keep address of first free block and count of following free blocks
- free space list then has entries containing addresses and counts
Space maps:
- used in zfs

TRIMing Unused Blocks
- SSD need to be erased before being written
- HDDs can write in place
We need different strategies for different types of storage.
TRIMing is used in NVM:
- Blocks keep data after they are freed (untile overwritten), only the pointer will be destroyed.
- Since NVM must be erased before written, that erases are made in large chunks (blocks, composed of pages) and that the latter is a slow process
- TRIM is a way that the file system has to inform the device that the page is free
- Garbage collection can be done, or if free block can be erased.
Efficiency and performance
Efficiency dependent:
- disk allocation and directory algos
- types of data kept in file’s directory entry
- pre-allocation or as-needed allocation of metadata structures
- fixed-size or varying size data structures
Performance:
- keep data and metadata close
- buffer cache - section of main memory used for frequently used blocks
- synchronous writes: requested by some apps
- no bufferin/ caching :
- free-behind and read-ahead
- optimize sequential access
- read-ahead
- if sequential read and I read b 10
- read also block 11
- free-behind
- if I read 10
- I won’t read 10 again, I can free it
- Reads frequently slower than writes
Page Cache
Different from buffer cache.
Cache pages rather thatn disk blocks using virtual memory techniques and addresses.
Memory-mapped I/O uses a page cache.
Routine I/O through the file system uses the buffer cach.

Unified Buffer Cache
A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O to avoid double caching.

Recovery
Done through consistency checking:
- Comparing redunant data: directory structure vs data blocks on disk
- tries to fix incosistencies
- Can be slow and fail sometimes
- Use file system programs to back up data from disk to another dev
- Recover lost file by restoring file from backup
Log structured file system
- Log structured file systems record each metadata update to the file system as a transaction in the log.
- Let’s put the intention of doing an operation, if the file system fails in the middle of the operation, we can recover it.
- Faster recovery from crash, remove chances of inconsistencies.