25. Log-Structured File Systems Flashcards

1
Q

From 1982 to 1991, what happened to the relative speed of seeks on hard disks?

A

Disk seek times are still dog slow.

They are becoming a performance bottleneck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we make a big, slow thing look faster?

A

Use a cache (IOW, put a smaller, faster thing in front of it)

In the case of the file system, the smaller, faster thing is memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we call the memory used to cache file system data?

A

The buffer cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

With a large cache, what benefit do we get when it comes to disk reads?

A

With a large cache, we should be able to avoid doing almost any disk reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does a cache assist with the latency associated with writes to disk?

A

The cache will help collect small writes in memory until we can do one larger write

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the best way to avoid seeks when writing?

A

Write everything to the same place

More realistically, write everything really really close to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is LFS?

A

Log-structured File System

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the main idea behind log-structured file systems?

A

All writes go to an append-only log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an example of a write to disk using a normal and cached-read write approach?

A

Let’s say we want to change an existing byte in a file

Normally, we do this:

  1. Seek to read the inode map
  2. Seek to read the inode
  3. Seek to write (modify) the data block
  4. Seek to write (update) the inode

Let’s assume that our big friendly cache is going to soak up the reads. Now what happens?

  1. Seek to read the inode map
  2. Seek to read the inode
  3. Seek to write (modify) the data block
  4. Seek to write (update) the inode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do we write to the log?

A

When the user calls sync, fsync, or when blocks are evicted from the buffer cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How did FFS translate an inode number to a disk block?

A

It stored the inode map in a fixed location on disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is it a problem for LFS that it stores the inode map in a fixed location on disk?

A

Inodes are just appended to the log and so they can move!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does LFS do to handle the fact that the inodes can move?

A

It logs the inode map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does LFS handle file system metadata, like inode and data block allocation bitmaps, etc?

A

We can log that stuff too!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens when the log reaches the end of the disk?

A

There is probably a lot of unused space earlier in the log due to overwritten inodes, data blocks, etc.

This space is reclaimed when we clean the log by identifying empty space and compacting used blocks

Conceptually you can think of this happening across the entire disk all at once, but for performance reasons LFS divides the disk into segments which are cleaned separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When should we run the cleaner in LFS?

A

Probably when the system is idle, which may be a problem on systems that don’t idle much

17
Q

What size segments should we clean when doing a log clean?

A

Large segments amortize the cost to read and write all of the data necessary to clean the segment

But small segments increase the probability that all blocks in a segment will be “dead”, making cleaning trivial

18
Q

What can we say about the performance effects of log cleaning?

A

Cleaner overhead is very workload-dependent, making it difficult to reason about the performance of log-structure file system

(and easy to fight about their performance!)

19
Q

Let’s say that the cache does not soak up as many reads as we were hoping. What problem can LFS create?

A

Block allocation is extremely discontiguous, meaning that reads may seek all over the disk

20
Q

What are three arguments in favor of reading research papers?

A
  1. Researchers have some great ideas about how to improve computer systems! Many times the design principles apply outside of the specific project described in the paper
  2. Both academic and industrial research labs publish papers. Frequently the best/only way to find out details about exciting production systems in use by companies like Google, Microsoft, Facebook, etc.
  3. Because reading the code takes way too long!
21
Q

What drives research in computer systems?

A

It aint features.

Hardware changes and other technology trends which both expose problems with existing systems and create new opportunities for better systems.