38. RAIDs Flashcards

Question 1

Q

What is RAID?

Answer

A

It’s pretty much a complete system, with its own memory, processor and multiple disks

Question 2

Q

What is main beauty of RAID other than capacity, redundancy and performance?

Answer

A

Its transparency to the system. It’s seen as a big disk to the system and it transform a single logical I/O into corresponding number of physical I/Os.

Question 3

Q

What is RAID0, what is its performance, redundancy and capacity?

Answer

A

RAID0 is stripping of data across multiple disks in parallel. It’s 1-to-1 capacity, zero fault-tolerance and Nx (where N is number of disks) reading and writing due to ability to read and write in parallel.

Question 4

Q

What is a chunk size?

Answer

A

It’s how much data we put on a single disk before moving onto next disk, while writing a stripe.

Question 5

Q

What are two main RAID performance metrics?

Answer

A

Single request latency
Steady-state throughput (total bandwidth)

Question 6

Q

What is RAID1? What is its performance?

Answer

A

It’s mirroring of data across multiple devices. It provides N/2 of capacity, can live through loss of at least 1 (or more, if lucky) disks. It provides usual read and write performance, while write may be a bit slow due to the slowest request in write operation.

Question 7

Q

What is RAID10? How it’s different from RAID01?

Answer

A

RAID10 is stripping with mirroring of data. It’s different from RAID01 in how it stripes across disks. With RAID10, the stripe goes across already duplicated pairs, and with RAID01 stripe goes over a pair which is then duplicated so if one device in a pair is corrupted, the whole pair is corrupted.

Question 8

Q

Why does not RAID1 provide better read performance for sequential read workloads? How does it do for random read workloads?

Answer

A

Because single sequential read is treated as a single request and goes to a single device. The cost of jumping over a block and seeking (if reading in parallel) neglects the performance gains as if just doing single sequential read on a device. With random workloads, it can achieve good read performance by utilizing parallel read.

Question 9

Q

How does RAID1 prevents inconsistency of data?

Answer

A

By maintaining write-ahead log in non-volatile RAM so it can rerun the transaction after recovery.

Question 10

Q

How does RAID4 work? What is its performance?

Answer

A

RAID4 uses 1 disk for parity information and stripes data across all others. It can endure loss of no more than 1 disk. For each stripe of data, it calculates a parity block by XORing bits. By having parity block, it can recover any of the lost bit in a stripe (on a single device) by XORing everything back. Sequential read and write are N-1 * S, due to parallelization (since data is striped), the random read also does well at N-1 * R speed, but the random writes are quite slow because of the need to recalculate the parity block where the parity device becomes a bottleneck. So the random write is 1/2 * R, because each write requires twice as many operations, regardless of amount of disks.

Question 11

Q

What are two ways to recalculate parity block on a new write?

Answer

A

Additive parity. Read all blocks and XOR them with a new one to recalculate parity block, write everything in a single stripe write in parallel.
Subtractive parity. Read old data block and old parity, compare old data and new data, if they are the same - remain with old parity block, if different - flip the bit on parity block.

Question 12

Q

How is random writes issue addressed?

Answer

A

In RAID5, it’s addressed by rotating the parity block across devices. RAID5 has a random write performance of N/4 * R and N * R of random reads, because all disks can participate in read (since parity block is on all of them).

Question 13

Q

What is single write operation time in RAID4 and RAID5?

Answer

A

It’s 2T, because it needs to perform 2 writes and 2 reads. It’s not 4 because 2 reads or 2 writes can happen simultaneously.