File Systems Flashcards
What is the purpose of a file system?
Reliable long-term storage of significant quantities of data
How is the file system typically implemented?
Using disks
Are tapes suitable for general purpose file systems? Why/why not?
No, as they are too slow
Is main memory suitable for general purpose file systems? Why/why not?
No, as it is volatile and too small; however, main memory can be used for temp files (/tmp on unix)
What constitutes a file?
A named sequence or collection of bytes stored on disk.
This abstract data type has the following operations defined for it:
1) create
2) write
3) read
4) reposition within file
5) delete
6) truncate
7) open(fi): search directory structure on disk for entry fi and move contents of it to memory
8) close(fi): move content of entry fi to directory on disk (from memory)
What operations are available on files?
1) create
2) write
3) read
4) reposition within file
5) delete
6) truncate
7) open(fi): search directory structure on disk for entry fi and move contents of it to memory
8) close(fi): move content of entry fi to directory on disk (from memory)
How are files named and protected?
Usually named with an extension, separated by a dot (.)
Each have unique identifier (inode)
How is free space managed? What advantages/disadvantages are there? Which method do most modern OSs use?
Free lists: list of free disk blocks pointing to next.
Disadvantage: free block must be read before it can be allocated
or
Bitmaps: 1 bit per block, 1 if free, 0 otherwise. These have the advantage of being able to search for free block within certain neighbourhood. Most modern OS use bitmaps for this reason.
How can kernel ensure fast access to files?
Buffer cache
How can file system recover from crashes?
…
Why is a file a logical storage unit?
Because the fact it is made up of a “bunch of blocks” stored on device (from OS standpoint) means that it is a good abstraction
How does UNIX view files?
As a sequence of bytes. Any structure on top of this is strictly for user programs.
What are some UNIX file types?
eg. executable, archive
What are the attributes of a file?
1) Name: only info kept in human readable form
2) Identifier: unique tag to identify file in file system
3) Type: needed for systems that support different types
4) Location: pointer to file location on device
5) Size: current file size
6) Protection: controls who can do reading, writing, executing
7) Time, date and user identification: data for protection, security and usage monitoring
Where is information about files kept?
Directory structure, which is maintained on disk
What does the file extension do?
Indicates ‘purpose’ of the file
What was early directory structure like?
Fixed structure consisting of home directory and variable number of subdirectories - one for each project user is involved in
What is directory structure like now?
Subdirectories may have subdirectories - these form a tree structure. Uses . for self directory and .. for parent directory
What is an inode?
(information node) contains all information kernel has about a file EXCEPT its name
What do . and .. refer to at the root?
These refer to the same inode as each other
Do files have to have extensions in UNIX?
No
What is a UNIX filename?
Sequence of names separated by slashes (/). Kernel translates this to inode number by looking up each component in sequence
How does kernel translate UNIX filename into an inode number?
Looks up each component in the sequence. If file name begins with slash, start searching in root directory. Otherwise start at inode of current directory of the process
If a filename begins with a slash, where will the kernel start search for it? If not?
If file name begins with slash, start searching in root directory. Otherwise start at inode of current directory of the process
Explain the concept of pathname translation
The kernel takes a UNIX filename and translates it into an inode number by looking up each component in the sequence. Kernel caches results of pathname translations for later reuse.
What classes of user are available in UNIX?
User (owner)
Group
Public (other)
What modes of access are available in UNIX?
Read, write, execute
Explain groups in UNIX
Each user is a member of one or more groups.
1 group is considered primary and the rest are secondary
What is gid?
Group ID
What is the relationship between a gid and a uid?
There isn’t any
What 2 files are information about users and groups stored in?
/etc/passwd and /etc/group
Explain UNIX file security
Unix files are protected by ACL of restricted form with three entries:
1) first applies to owner of file
2) second applies to members of the group associated with the file
3) third applies to everyone else
What does 751 rwxr-x–x u1 g1 do?
…
What does 705 rwx—r-x u2 g2 do?
…
What types of links are there?
Hard links: Points to file by inode number. Finding other files with the same name means having to compare inode numbers of files as this is the identifier.
Symlinks: points to file by name (file doesn’t have to exist).
What happens when a symlink is opened?
Symbolic links are those files that contain a file name.
The kernel sees the symbolic link in the inode and opens the named file instead. This named file may be a symlink.
What happens when you link 2 files together?
Their inode number and contents will be the same
Are hard links allowed for directories?
No
What do open(fi) and close(fi) do?
open(fi): search directory structure on disk for entry fi and move contents of it to memory
close(fi): move content of entry fi to directory on disk (from memory)
How does UNIX refer to a given input or output stream?
Via file descriptors (small int)
What is a unix program?
Given input or output stream by a file descriptor
What does the kernel use file descriptors for?
To index into table represent a process’ open files. This table is only visible to kernel.
What is the current offset into a file?
This variable stores the distance in bytes between beginning of file and next byte to be read of written
How can programs perform random access to a file?
Manipulate current offset through lseek system call
What tables does unix kernel keep about files?
1) open file table: global table containing 1 entry for each open without a close, with each entry containing current offset
2) file descriptor tables: small per-process tables, each of which maps file descriptors of that process to open file table entries
3) active inode table: global table containing information about every active file
What is the open file table?
global table containing 1 entry for each open without a close, with each entry containing current offset. Kept by kernel.
What is the file descriptor table?
small per-process tables, each of which maps file descriptors of that process to open file table entries. Kept by kernel
What is the active inode table?
global table containing information about every active file. Kept by kernel
Where are the kernel tables relating to files kept?
System space
Where does a read file system call occur from?
User space
Where are inode list and data blocks kept?
Disk space
Explain the process of reading a file?
The read call is made from user space. This traps into system space in order to access the file tables. The file descriptor links to open file table which links to inode table. . The inode table maps to appropriate data in disk space.
What happens to file descriptor tables when a fork occurs?
This is copied. Each operation by either process will advance the current offset.
Can different processes have entries in their file descriptor tables that link to the same open file entry?
Yes
What is stdin
Program can read what user types
what is stdout?
program can send output to user’s screen
What is stderr?
error output
If a file is invoked by the shell, what file descriptors will definitely be open?
Stdin, stdout, stderr
Explain disk drive in unix
This is divided into one or more partitions. The kernel uses each partition as a virtual disk?
What does the kernel use a virtual disk?
Each partition that the disk drive is divided into
What do disk drive partitions do?
Serve as a component of swap space or may hold file system
What is a file system?
A tree hierarchy consisting of directories.
What is mounting of a file system
We can attach file systems by mounting them to other directories. The mount point should be an empty directory as the mounting operation will overwrite the sub tree of this mount point directory
What file systems are mounted after start up?
Usually only removable media
When may file systems be mounted and/or unmounted?
Any time when they are not used. Root file system must stay mounted when system is up
How big must the disk partition with home directories be?
Big enough to store all file systems created by user
Describe the structure of a file system
the boot block (block 0): contains code to bootstrap the system (to get system up and running)
the super block (block 1): contains summary information for the file system (where to find everything else). master file table
blocks for a fixed number of inodes (points off to data)
blocks for file data. heap of index values
(each block contains a fixed number of disk sectors)
What file allocation methods are available?
Contiguous allocation Linked allocation File allocation table Indexed allocation Multi level indexed allocation
Explain contiguous file allocation
Everything next to each other in array with start position and size. This will have a lot of fragmentation when files are deleted and empty spaces appear
Explain linked file allocation
Linked lists. Going from one block to another.
Explain FAT
(file allocation table) similar to linked list allocation but linked list is stored in a table called the DFAT which speeds up direct access. The FAT can be cached.
Smarter approach to linked list allocation. Array of table of pointers. Each file maintains only linked list of where data blocks are, with pointer pointing to start of file
Explain indexed allocation and multilevel indexed allocation
This is similar to a page table. It has indexes to table which tell us location of files.
Multilevel: inode consists of pointers to index blocks. The inode points to a second level index block. Each second level index block points to a data block. High levels of indirection are possible
What are the 3 versions of FAT?
FAT 12
FAT 16
FAT 32
Differ in how many bits a disk address contains
What is a virtual file system?
2d table mapping system calls (one of the dimensions) and file system types (other dimension) into a pointer to a procedure that implements appropriate variant of system call for files on that type of file system
How can we implement an alternative file system type?
VFS. Virtual file system switch. 2d table mapping system calls (one of the dimensions) and file system types (other dimension) into a pointer to a procedure that implements appropriate variant of system call for files on that type of file system
In the case of a VFS, what happens if the system call does not apply to that file system type? What is an example of when this might occur?
(eg. seeking on terminal or pipe)
Procedure will return error indication
Are we able to ‘seek’ on a terminal or pipe?
No, this will return error indication
How does linux file system appear to user?
As a tree
How does kernel manage file system?
Kernel hides implementation details and manages multiple different file systems via an abstraction layer - the VFS.
What 2 components is linux VFS composed of?
1) set of definitions that define what a file object is allowed to look like
2) layer of software to manipulate file objects
How does linux device-oriented file system access disk storage?
Through 2 caches:
1) data is cached in page cache, which is unified with virtual memory system
2) metadata is cached in buffer cache, a separate cache indexed by physical disk block
How many classes does linux split all devices into? What are these?
3.
1) block devices allow random access to completely independent, fixed size blocks of data
2) character devices include most other devices and don’t need to support functionality of regular files
3) network devices are interfaced via the kernel’s networking subsystem
Explain buffer cache
Disk block that was accessed recently is likely to be accessed in nearby future. Unix caches disk blocks in main memory for easy access.
Explain I/O transfers in unix
Move data between disk and buffer cache
What replacement strategy is generally used by buffer cache?
LRU (least recently used)
Explain read-ahead
If recent file accesses were sequential, unix will prefetch the next block
Explain write-behind
Copy data from user address space to buffer cache. Affected blocks are marked ‘delayed write’. Daemon process that wakes up every 30s schedules these blocks to be written to disk. Mark is deleted when write is completed. This is write behind.
When are blocks evicted from buffer cache?
Only when space is needed to cache another block
How do write system calls work?
Copy data from user address space to buffer cache. Affected blocks are marked ‘delayed write’. Daemon process that wakes up every 30s schedules these blocks to be written to disk. Mark is deleted when write is completed.
Why would we want to use buffer cache?
Speak about temporal and spatial locality, quick access to files, read ahead, write behind
What are the advantages of buffer cache?
Reduce number of disk reads required if references show locality
No need to require read/write requests to be aligned
Files with short lifetimes usually require no disk access
If file is updated several times in short period, only the final version needs to be written to disk
Scheduling more blocks at same time gives more scope for disk scheduler
No need to lock user pages in memory during I/O transfers
What are the disadvantages of buffer caches?
Disk accesses require extra pass over data (eg. disk to buffer cache, then buffer cache to memory instead of disk to memory)
Delayed writes may be lost in a crash
What is a unix pipe?
IPC channel that can be accessed with read and write operations ‘file1 | file 2’. Each pipe has associated buffer
What happens to read and write on broken pipes?
They fail
What will happen with a read on an empty pipe or a write on a full pipe?
Suspend
Explain Dekker’s algorithm
whoever setts shared/turn variable last has to wait, needs more expl
What is a directory?
File maintained by the kernel
What is a device directory?
Collection of nodes containing information about all files on partition. Both directory structure and files reside on disk
What are solutions to the complexity caused by having a lot of files?
Break file systems into partitions
Hold info about files in partitions
What is the access matrix/access control list?
Contains information are about which subjects can perform which actions on which objects
Explain security of UNIX files
Protected by an ACL of restricted form with 3 entries:
1) owner of file
2) members of group associated with file
3) everyone else
The ACL is represented at a nine bit - usually octal number
What is a file descriptor?
Kernel uses this to index into table representing open files of process
Name a benefit of small pipe size
Pipe data is rarely written to disk (kept in memory by normal block buffer cache)