- Several disks play the role of one - Each disk detects its own errors using codes in each sector

Storage and Fault Tolerance Flashcards by Robert Lindgren

What does RAID stand for?

Redundant Array of Independent Disks

How well did you know this?

Not at all

Perfectly

What is a RAID?

Several disks play the role of one
Each disk detects its own errors using codes in each sector

How well did you know this?

Not at all

Perfectly

What should a RAID do?

Provide better performance
Normal read/write even when we have a bad sector or a whole disk fails.

How well did you know this?

Not at all

Perfectly

What is RAID0?

Uses “striping” (track 0 on first disk, track 1 on second, etc) for increased throughput

How well did you know this?

Not at all

Perfectly

What is RAID0’s performance?

Nx data throughput (N is number of disks)
less queuing delay (latency)

How well did you know this?

Not at all

Perfectly

MTTF?

Mean Time to Failure

How well did you know this?

Not at all

Perfectly

MTTDL

Mean Time to Data Loss

How well did you know this?

Not at all

Perfectly

Failure rate

f = failures per disk per second

How well did you know this?

Not at all

Perfectly

Single disk MTTF

1/f

How well did you know this?

Not at all

Perfectly

Single disk MTTDL

Single disk MTTF

How well did you know this?

Not at all

Perfectly

What is the MTTF of RAID0 with N disks?

MTTF_N = MTTDL_N = MTTD_1 / N

How well did you know this?

Not at all

Perfectly

What is RAID1?

Uses “mirroring” (same data on multiple disks) for reliability.

How well did you know this?

Not at all

Perfectly

What is the throughput of a RAID0 with N disks?

N x the throughput of one disk

How well did you know this?

Not at all

Perfectly

What is throughput of RAID1?

Write: same as 1 disk
Read: N x throughput of one disk

How well did you know this?

Not at all

Perfectly

Reliability of RAID1?

RAID1 tolerates any faults that affect one disk. Has ECC on each disk sector, so it knows where the error is and on which disk. It can then use the mirror copy on the other disk instead.

How well did you know this?

Not at all

Perfectly

MTTR

Mean Time to Repair

What is the MTTDL for RAID1 if we don’t repair a damaged disk?

MTTDL = (MTTF_1/N) + MTTF_1

What is the MTTDL for RAID1 if we DO repair a damaged disk?

MTTDL = (MTTF_1/N)*(MTTF_1/MTTR)

i.e., MUCH MUCH MUCH better than not repairing.

What is RAID4?

Uses “block-interleaved parity” to both improve performance and reliability.

For N disks, N-1 have data striped just like in RAID0 and 1 has the parity bits calculated from the other disks using bitwise XOR.

What is the throughput of RAID4?

Reads: (N-1) * throughput of one disk
Writes: 1/2 the throughput of one disk!

When we write in RAID4, we need to write to a disk, read from the parity disk, and write to a parity disk. The parity read can happen in parallel with the main write, so overall it takes twice as long.

THIS is why we need RAID5.

MTTF of RAID4

(MTTF_1/N) * (MTTF_1/(N-1)*MTTR_1)

But what’s the N-1 about? This is because it takes MTTF_1/N time for the first failure. We plan to repair before the second failure, which takes MTTF_1/(N-1) * 1/MTTR_1.

How do we compute new parity on RAID4 write?

First we XOR the new data and old data for the block we’re writing to (which finds all changes). Then we XOR that with the old parity. The result is stored as the new parity.

Because we only have one parity disk, that creates a bottleneck on writes. Hence the need for RAID5.

What is RAID5?

DISTRIBUTED block-interleaved parity. Similar to RAID4, but the parity stripes are distributed among all disks (the first might be on disk 4, the next on disk 1, the next on 2, etc).

What is the throughput of RAID5?

Where N is the total number of disks…

Reads: N * throughput of one disk (because now we can actually read from all N disks at once!)
Writes: N/4 * throughput of one disk! (because we still need 4 total accesses per write, but they’re distributed over all N disks)

What is the reliability of RAID5?

Same as RAID4! We can always recover from the loss of one disk. If we lose parity, we can still read/write from the other disks. If we lose one data disk, we can reconstruct data from parity. So it's just as reliable, without bottleneck on writes. MTTF is the same as RAID4.

MTTF of RAID5

Same as RAID4! (MTTF_1/N(N-1)) * (MTTF_1/MTTR_1) But what's the N-1 about? This is because it takes MTTF_1/N time for the first failure. We plan to repair before the second failure, which takes MTTF_1/(N-1) * 1/MTTR_1.

What is RAID6?

- Two parity blocks per group. - Can work when two stripes per group have failed - One true parity block - The other "parity" block is a different kind of check block - When one disk fails, use parity - When two fail, solve some equations using both parity and "parity" blocks to recover data

When should you use RAID6?

RAID5 vs RAID6

- RAID6 has 2x the overhead (what is the "overhead"?) - More write overhead: when we write, we need to read and write the data block plus BOTH check blocks - RAID6 is only useful when the chance of a second disk failing during our MTTR is actually high. This is unusual.

Is RAID6 overkill?

RAID6 seems like overkill when you assumes disk failures are independent. Under RAID5, the likelihood of a second independent failure happening during the repair period is very low. So why use RAID6? Because disk failures are NOT necessarily independent. For example, say you remove the wrong disk when trying to replace the failed one: you now have a second, correlated failure. RAID6 would've been nice.