Storage & File systems Flashcards
What’s better to use FD or filename?
FD versions are more secure in some sense
• Because association of FD to underlying file is immutable
– Once an FD exists, it will always point to the same file
• Whereas association between file & its name is mutable
• Using names might lead to TOCTTOU (time of check to time of use)
races
What are the 6 types of POSIX files?
– Regular file – Directory – Symbolic link (= shortcut), a.k.a. soft link – FIFO (named pipe) – Socket – Device file
What is Access control list (ACL)?
Most OSes/filesystems support some form of ACLs
– Many groups/users can be associated with a file
– Each group/user can be associated with the 3 attributes (r/w/x)
– Or more, finer attributes (“can delete”, “can rename”, etc.)
CON: not part of POSIX
Is there a difference between a filename and filepath in UNIX?
No, filename = filepath = path
Is the file and filename the same thing in UNIX?
No.
– In fact, the name is not even part of the file’s metadata
– A file can have many names, which appear in unrelated places in the
filesystem hierarchy
– Creating another name => creating another “hard link”
Can we have hard links to directories?
– Hard links to directories are usually disallowed & unsupported by the
filesystem (though POSIX does allow directory hard links)
– => Acyclic graph (no circles)
Still, all filesystems that adhere to POSIX provide at least
some support to directory hard links
– Due to the special directory names “.” and “..”
– What’s the minimum number of hard links for directory?
• 2 (due to “.”)
– What’s the maximum?
• Depends on how many subdirectories nest in it (due to “..”)
What’s the difference between a hardlink and a soft link?
Unlike hard links
– Which point to the actual underlying file object
• Symlinks (“shortcuts” in Windows terms)
– Point to a name of a “target” file (their content is typically this name)
– They’re not counted in the file’s ref count
– They can be “broken” / “dangling” (point to a nonexistent path)
– They can refer to a directory (unlike hard links in most cases)
– They can refer to files outside of the filesystem / mount point
(whereas hard links must point to files within the same filesystem)
What does the unlink function removes?
Will remove the symlink, not the target file
What’s the inode?
The OS data structure that represents the file
nternally, file names “point” to inodes
• This inode is determined via the path-resolution algorithm
The inode contains all the metadata of the file
What’s a directory file?
A simple flat file comprised of directory entries (dirents).
Where is the name of the file stored?
In the directory file, not the inode.
Do symlinks have there own inode?
POSIX doesn’t specify whether a symlink should have an
inode.
But filesystems often define an inode per symlink, it then points to the files “real” inode, or if the name is to long then to a data block where a pointer to the “real” inode lays.
What is the lower bound for the time complexity of Path resolution process?
n. Because finding each individual directory component
along the path may also be a linear process
What’s the block/sector size in VSFS?
4kB.
What is the layout of VSFS?
0 block = superblock 1 block = inode bitmap 2 block = data blocks bitmap blocks 3 - 7 = Inode table blocks 8 - 63 = Data blocks
What is the superblock?
Contains information about the particular filesystem
Location of the superblock (of any FS) must be well-known
What’s a disk partition?
Partition = contiguous disjoint part of the disk that can host a filesystem
Given inumber (=index of inode in table), how can we find inode block?
sector = (inodeStartAddr + (inumber x sizeof(inode_t)) / blockSize
How is Multi-level index (in the classic Unix FS) implemented?
- 12 pointers in inode point directly to data blocks
- Single-indirect pointer points to a block completely comprised of pointers to data blocks
- Double-indirect
- Triple-indirect
sum = 15 pointers, giving a total coverage of 4TB. (assuming each pointer is 4B).
Do the indirect pointer blocks included in the file size?
No.
What’s an extent?
- Extent = contiguous area comprised of
variable number of blocks
– inode saves a list of (pointer, size) pairs
Describe FAT layout
– One table for all files
– One table entry for every disk block
– -1 (0xFFFF) marks “the end”
– Directory file: content of each entry points to start of file
Is the FAT table copied to memory?
FAT (table) is copied to (cached in) memory
– But of course must also be saved on disk
– Solves random access problem of file pointers
Describe the pros and cons of FAT.
Pros
– Simple to implement:
• File append, free block allocation management, easy allocation
– No external fragmentation
– Fast random access since table is in memory (for block pointers, not
necessarily for the blocks themselves)
Cons
– File contiguity can be lost (this is why extents were invented)
– Table can be huge
• 32 GB (disk) / 4KB (block) = 2^5 x 2^30 / 2^12 = 2^23 = 8 M
• Assuming 4B pointer, this means 32 MB table
What’s RAID 0, pros and cons?
Non-redundant disk array
– Files striped evenly across N ≥ 2 disks
Pros
– High read/write throughput
- Faster aggregated seek time, that is
Cause we can do things concurrently.
Con
– Any disk failure results in data loss
What’s RAID 1, pros and cons?
• Mirrored disks
– Files are striped across half the disks
– Data written to 2 places: data disk & mirror disk
Pros
– On failure, can immediately use surviving disk
– Read performance: similar to RAID 0 (can read concurrently as well) Write is twice slower.
Cons
– Wastes half the capacity
What’s RAID 4, pros and cons?
• Use parity disks
– Each block (= multiple of sector) on the parity disk is a parity function
(=xor) of the corresponding blocks on all the N-1 other disks
• Failure => “degraded read”
– Read remaining disks plus parity
disk to compute missing data
• Pros
– In terms of capacity, less wasteful than RAID-1 (wastes only 1/N)
– Read performance similar to RAID-0, Write need to update parity block.
(although we don’t really need to read the old value first, just write).
What’s RAID 5, pros and cons?
• Similar to RAID-4, but uses block interleaved distributed parity
– Distribute parity info across all disks
• Pros – Like RAID-4, but better because it eliminates the hot spot disk – E.g., when performing two small writes in RAID-4 • They must be serialized – Not necessarily so in RAID-5 • => better performance
What’s RAID 6, pros and cons?
• Extends RAID-5 by adding an additional “parity” block
– Ap and Aq must be independent of each other, algebraically speaking
• Pros & cons relative to RAID-5
– Can withstand 2 disk failures (2 equations, can find 2 variables)
– But wastes 2/N rather than 1/N of the capacity
What’s RAID 2&3, pros and cons?
Like RAID-4
– But in bit and byte resolution, respectively
What does the RAID n+k means?
- n blocks of regular data
* k blocks that provide redundancy
מהם המטמונים שלינוקס שומרת בהקשר של מערכות קבצים?
- מטמון הדפים (page cache) – עבור המידע של איזור הנתונים.
- מטמון inodes – עבור המידע של טבלת ה-inodes.
- מטמונים נוספים עבור ה-bitmaps או מבני נתונים אחרים של מערכת הקבצים.
מה המבנה של תיקיות בFAT?
תיקיות הן קבצים רגילים המכילים רשימה של רשומות שבכל אחת יש את השדות:
filename, metadata, starting block
כל רשומה מכילה את המטאדאטה שלה לכן הinode כאילו מוטמה בתוך הרשומה.
האם FAT תומכת בקישורים קשים?
לא. כי המטאדאטה של הקובץ שמור ברשומה של התיקיה.