File System Implementation (14) Flashcards

1
Q

FS Structure

A
  • File structure
    • Logical storage unit
    • Collection of information that are related
  • File systems resides on disks
    • Provided user interface to storage, that maps logical to physical addr
    • Provides efficient and cheap access to disk: allows data to be stored, located and retrieved
  • Disk provides in-place rewrite and random-access
    • I/O transfers perfomed in blocks of sectors (usually 512bytes)
  • FCB (File control block): structure where all file info are contained
  • Device driver: controls the physical dev
  • The file system is layered.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Layers of the File system

A

Device drivers manage I/O devices at the I/O control layer

  • Given commandsl ike“read drive1,cylinder72,track2,sector 10, into memory location 1060” outputs low-level hardware specific commands to hardware controller

Basic file system given command like “retrieve block 123” translates to device driver

  • Also manages memory buffers and caches (allocation, freeing, replacement)
    • Buffers hold data in transit
    • Caches hold frequently used data

File organization module understands files, logical address, and physical blocks

  • Translates logical block # to physical block #
  • Manages free space and disk allocation

Logical File system manages metadata info

  • converts file names into numbers, file handle, location by mantaining file control blocks
  • directory management
  • protection

Layering is useful for reducing complexity and redundancy, but adds overhead and can decrease performance.

Operating Systems support more than one file system at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

File Control Block info

A
  • FCB that describes a file contains many details:
    • I node number, permission, size, dates
    • NTFS stores the FCB into the master file table in a similar fashion to relational db
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

File System Operations

A

Boot control block

  • contains info needed by system to boot OS from that volume
  • Needed if volume contains OS, usually first block of volume

Volume control block

  • contains volume details
  • Total # blocks, # of free blocks, block size, free block pointers or array

Direcotry structures:

  • Organizes the files: names, inode numbers, master file table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In-memory file system structures

A
  • Mount table storing file system mounts, mount points, file system types
  • system-wide open-file table contains
    • a copy of the FCB of each file
    • synchronization primitives
    • ref. count
    • block # and offset in the block
  • per-process open-file table contains pointers to appropriate entries in system-wide open-file table as well
    • access method
    • pointer to file
    • take a look at pic
    • More processes may open the same file, on system-wide table there is only one entry per file.
  • Open return a file handle
  • Buffers hold data blocks that come from disk
  • Data read is eventually copied into a specif user process memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Directory Implementations

A

Linear List of file names

  • Simple to code
  • Time-consumeing to execute O(n)
    • use B+-Trees?

Hash Table

  • O(1) search time
  • Collissions must be dealt with
  • Good if entries are fixed size
    • or we should use chained overflow method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Allocations Methods: Contiguos

A

Contiguous allocation – each file occupies set of contiguous blocks

  • Usually best performance
  • Simple
    • only starting location (block#) and length(number of blocks) are required
  • Problems:
    • finding space for file
    • knowing file size
    • external fragmentation
    • need for compaction off-line (downtime) or on-line
      • Compaction is the process of grouping the free space in a specifi part of the disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Allocation methods: Extent-Based Systems

A

Many newer file systems (i.e., Veritas File System) use a modified contiguous allocation scheme.

An extent is a contiguous block of disks

  • Extents are allocated for file allocation
  • A file consists of one or more extents

Contiguos chunk is allocated initially, if the amount is not large enough, another chunk of contiguos space is added.

The location of a file’s block is recorded, as well as block count and a link to the first block of the next extent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Allocation Methods: Linked

Pros and Cons

A

Linked allocation: each file a linked list of blocks

  • File ends at nil pointer
  • Good:
    • No external fragmentation: the are no small unused small memory blocks
    • No compaction: no need to move objects around
    • Free space management system called when new block is needed
    • Improve efficiency by clustering blocks into groups but increasse internal frag.
  • Bad
    • Locating a file may take many I/Os and disk seeks
    • Reliability can be a problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

FAT (File Allocation Table)

A

Linked allocation with some features;

  • beginning volume has table indexed by block number
  • similar to linked list, but faster on disk and cacheable
  • new block allocation simple
  • provided that the FAT is reliable, if we loose one block, we have the rest of the file
    • this is a limite of non-FAT linked allocation methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linked Allocation

A
  • Each file is a linked list of blocks
  • Mapping
    • If we consider that a pointer has a size of one word
      • for each block we have 511 words for data and 1 for the pointer
      • we need to divide the size by 511 to get the required number of blocks
    • LA/511
      • Q is the logical address
      • Displacement into block is R + 1
      • Physical and logical cannot be mapped directly there needs to be an addition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Allocation Methods: Indexed

A
  • Each file has its own index block(s) containing pointers to data blocks
  • Index table is needed
  • random access
  • dynamic access without external fragmentation, but have overhead of index block
  • Mapping from logical to physical in a file of maximum size of 256K bytes and block size of 512bytes. We need only 1 block for index table
    • LA/512
      • Q = displacement into index tbale
      • R = displacement into block
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Allocation Methods: Indexed

Unbounded Length File

A

Linked Scheme:

mapping from logical to physical in a file of unbounded length (block size of 512 words)

  • LA/(512x511) =
    • Q1: displacement of index table
    • R1/(512) =
      • Q2: displacement into block index table
      • R2: displacement into block of file

Two levels:

4K blocks can store 1024 four-byte pointers in outer index, that is 1M data blocks and up to 4GB of files.

As before but we don’t have the pointer displacement in the block:

  • LA/(512x512) =
    • Q1: displacement into outer-index table
    • R1/(512) =
      • Q2: displacement into block index table
      • R2: displacement into block of file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Allocation: Combined Scheme: UNIX UFS

A

4K bytes per block, 32-bit addresses

More index blocks than can be address with 32-bit file pointer.

The inode is a sort of multi-level index table, that allows small files to be stored in the first few blocks, but if the file is larger, the single/double/triple index are used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Performance of Access Methods

A
  • Best method depends on what we are going to do
    • contiguos is great for sequential and random
    • Linked good for sequential, not random
  • Declare access type at creation
  • Index more complex
    • Single block access could require 2 index block reads then data block read
    • clustering can help improve throughput, reduce cpu overhead
  • For nvm, no disk head so different algorithms and optimizations needed
    • old algo uses many cpu cycles trying to avoid non-existend head movement
    • with nvm goal is to reduce cpu cycles and overall path needed for I/O’
  • Doing other tasks during the I/O operations must be done, otherwise the CPU will be useless for a lot of time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Free Space Management:

A
  • Free space list
    • list of free blocks linked
    • cannot gat contiguos space easily
    • no space waste, because we can insert the information about wheter the next block is free inside the metdata of the block.
    • no need to traverse to find a free node
  • Bit vector or bit map
    • bit map requires extra space
    • easy to get contiguos file
    • Example
      • Block size : 4KB = 2^12
      • Disk size: 1TB = 2^40
      • n = 2^40/2^12 = 2^28 bits (32MB)
      • We could group blocks of 4, b[i] (are blocks i * 4, i *4 + 1, i * 4 + 2, i * 4 + 3 free ?), and the dimension would be 8MB

Grouping

  • Linked list stores address of next n-1 free blocks in first free block, plus a pointer to next block that contains another group of free-block pointers.

and counting:

  • because space is frequently used and freed contiguosly
    • keep address of first free block and count of following free blocks
    • free space list then has entries containing addresses and counts

Space maps:

  • used in zfs
17
Q

TRIMing Unused Blocks

A
  • SSD need to be erased before being written
  • HDDs can write in place

We need different strategies for different types of storage.

TRIMing is used in NVM:

  • Blocks keep data after they are freed (untile overwritten), only the pointer will be destroyed.
  • Since NVM must be erased before written, that erases are made in large chunks (blocks, composed of pages) and that the latter is a slow process
  • TRIM is a way that the file system has to inform the device that the page is free
    • Garbage collection can be done, or if free block can be erased.
18
Q

Efficiency and performance

A

Efficiency dependent:

  • disk allocation and directory algos
  • types of data kept in file’s directory entry
  • pre-allocation or as-needed allocation of metadata structures
  • fixed-size or varying size data structures

Performance:

  • keep data and metadata close
  • buffer cache - section of main memory used for frequently used blocks
  • synchronous writes: requested by some apps
    • no bufferin/ caching :
  • free-behind and read-ahead
    • optimize sequential access
    • read-ahead
      • if sequential read and I read b 10
      • read also block 11
    • free-behind
      • if I read 10
      • I won’t read 10 again, I can free it
  • Reads frequently slower than writes
19
Q

Page Cache

A

Different from buffer cache.

Cache pages rather thatn disk blocks using virtual memory techniques and addresses.

Memory-mapped I/O uses a page cache.

Routine I/O through the file system uses the buffer cach.

20
Q

Unified Buffer Cache

A

A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O to avoid double caching.

21
Q

Recovery

A

Done through consistency checking:

  • Comparing redunant data: directory structure vs data blocks on disk
    • tries to fix incosistencies
    • Can be slow and fail sometimes
  • Use file system programs to back up data from disk to another dev
  • Recover lost file by restoring file from backup
22
Q

Log structured file system

A
  • Log structured file systems record each metadata update to the file system as a transaction in the log.
  • Let’s put the intention of doing an operation, if the file system fails in the middle of the operation, we can recover it.
  • Faster recovery from crash, remove chances of inconsistencies.