Storage and Fault Tolerance Flashcards
What does RAID stand for?
Redundant Array of Independent Disks
What is a RAID?
- Several disks play the role of one
- Each disk detects its own errors using codes in each sector
What should a RAID do?
- Provide better performance
- Normal read/write even when we have a bad sector or a whole disk fails.
What is RAID0?
Uses “striping” (track 0 on first disk, track 1 on second, etc) for increased throughput
What is RAID0’s performance?
- Nx data throughput (N is number of disks)
- less queuing delay (latency)
MTTF?
Mean Time to Failure
MTTDL
Mean Time to Data Loss
Failure rate
f = failures per disk per second
Single disk MTTF
1/f
Single disk MTTDL
Single disk MTTF
What is the MTTF of RAID0 with N disks?
MTTF_N = MTTDL_N = MTTD_1 / N
What is RAID1?
Uses “mirroring” (same data on multiple disks) for reliability.
What is the throughput of a RAID0 with N disks?
N x the throughput of one disk
What is throughput of RAID1?
Write: same as 1 disk
Read: N x throughput of one disk
Reliability of RAID1?
RAID1 tolerates any faults that affect one disk. Has ECC on each disk sector, so it knows where the error is and on which disk. It can then use the mirror copy on the other disk instead.
MTTR
Mean Time to Repair
What is the MTTDL for RAID1 if we don’t repair a damaged disk?
MTTDL = (MTTF_1/N) + MTTF_1
What is the MTTDL for RAID1 if we DO repair a damaged disk?
MTTDL = (MTTF_1/N)*(MTTF_1/MTTR)
i.e., MUCH MUCH MUCH better than not repairing.
What is RAID4?
Uses “block-interleaved parity” to both improve performance and reliability.
For N disks, N-1 have data striped just like in RAID0 and 1 has the parity bits calculated from the other disks using bitwise XOR.
What is the throughput of RAID4?
Reads: (N-1) * throughput of one disk
Writes: 1/2 the throughput of one disk!
When we write in RAID4, we need to write to a disk, read from the parity disk, and write to a parity disk. The parity read can happen in parallel with the main write, so overall it takes twice as long.
THIS is why we need RAID5.
MTTF of RAID4
(MTTF_1/N) * (MTTF_1/(N-1)*MTTR_1)
But what’s the N-1 about? This is because it takes MTTF_1/N time for the first failure. We plan to repair before the second failure, which takes MTTF_1/(N-1) * 1/MTTR_1.
How do we compute new parity on RAID4 write?
First we XOR the new data and old data for the block we’re writing to (which finds all changes). Then we XOR that with the old parity. The result is stored as the new parity.
Because we only have one parity disk, that creates a bottleneck on writes. Hence the need for RAID5.
What is RAID5?
DISTRIBUTED block-interleaved parity. Similar to RAID4, but the parity stripes are distributed among all disks (the first might be on disk 4, the next on disk 1, the next on 2, etc).
What is the throughput of RAID5?
Where N is the total number of disks…
Reads: N * throughput of one disk (because now we can actually read from all N disks at once!)
Writes: N/4 * throughput of one disk! (because we still need 4 total accesses per write, but they’re distributed over all N disks)