RAIDS Theory Flashcards
What are RAIDs?
They are redundant arrays of independent discs
In other words, it is a group of independent discs that are considered as a single, large, high-performance logical disc
What are RAIDs for??
Increase the performance, the size and the reliability of storage systems
What are the consequences of stripping data across several discs?
Higher data, transfer rate, higher I/O, a need for load balancing across the disks
What are the two octagonal techniques implemented in RAID?
Data striping, to improve performance
Redundancy, to improve reliability
What is data striping?
Data striping is a method used to improve the performance and throughput of storage by distributing data across multiple disks. Here’s a more detailed explanation:
Definition: Data striping involves dividing a body of data into smaller blocks and spreading these blocks across multiple physical disks in a RAID array. This technique enhances the read and write speeds by allowing multiple disks to operate simultaneously.
How does data stripe works?
How It Works:
- Data is broken down into chunks or stripes.
- Each chunk is written to a different disk in the array.
- When a file is read, the system can read different parts of the file from multiple disks simultaneously, speeding up the process.
What is a stripe unit and a stripe width?
A stripe unit is the dimension of the unit of data that are written on a single disk
The stripe width this is the number of discs considered by the striping algorithm
What is the main motivation for the introduction of redundancy in the RAIDs?
The fact that the more physical discs the larger the probability of failure of a disk
What is the main drawback involving redundancy?
Since right operations must update also the redundant information, their performance is worse than the one of the traditional writes.
What are the orthogonal techniques present in each RAID type
RAID 0: stripping only
RAID 1: mirroring only
RAID 0+1: nested levels
RAID 1+0: nested levels
RAID 4: block interleaving (redundancy, parity disc)
RAID 5: block interleaving (redundancy, distributed parity disc)
RAID 6: greater redundancy
Describe the RAID 0
Data are written to a single logical disc and split in several blocks distributed across the disk according to a stripping algorithm
What are the primary concerns for the RAID 0
Performance and capacity, rather than reliability
What are the advantages and disadvantages of the RAID 0
Lower cost (it does not employ redundancy)
Best right performance (it does not need to update redundant data, and it is paralyzed.)
The drawback is that a single disc failure will result in data loss
How would a RAID 0, four disc array with two stripes be organized, and what would be its capacity?
Capacity of 4 physical disks
Disk 0 Disk 1Disk 2 Disk 3 Stripe1|B1 | B2. |. B3.|. B4 Stripe2|B5 |. B6. |. B7.| B8
What are the impacts of the chunk size in the disc array performance?
Smaller chunks leads to greater parallelism
Bigger chunks reduce seek time
Analyze the RAID 0
Capacity is equal to the number of discs
There is no reliability, if any drive fails Data is permanently lost. These means that the meantime to data loss (MTTDL) is equal to the meantime to failure (MTTF)
Sequential and Rand read and write operations can be fully parallelized
What is the RAID 1key idea?
Make two copies of all data
What are the advantages and disadvantages of the RAID 1
Advantages: High reliability, read of data (data can be retrieved from the short queue disc), first right (no error correcting code should be computed)
Disadvantages: high cost (50% of the capacity)
What is the description of RAID 0+1
Mirroring first, then striping
What is the description of the RAID 1+0
Stripping first then mirroring
What is the main difference between RAID 0+1 and RAID 1+0
RAID 0+1 Fault tolerance is less
RAID 1+0 Fault tolerance is larger
What are the capacities and reliabilities of RAID 1
The capacity is equal to the number of disks divided by two
And the reliability is in the worst case One disc, but if you’re lucky the number of discs divided by two discs can fail
What are the sequential writte and read capabilities of RAID 1
Since half of the disks are used for copied data, there is half of the throughout, what is equal to N/2 * S
What are the capacities and reliabilities of RAID 0+1 and RAID 1+0
What are the sequential writte and read capabilities of RAID 0+1 and RAID 1+0
What are the random read and write capacities of RAID 1
A random read can be done in parallel across all discs, and therefore is equal to N * R
A write, otherwise, needs to be replicated to all discs, which results in half of the disc performance: N/2 * R
What is the difficulty of guaranteeing atomic mirrored right? And what the RAID controllers include in order to mitigate it.?
It is difficult to guarantee (ex: power failure). This way many RAIDs controllers include a write-ahead log, which is a battery backed, non-volatile storage of pending rights
Describe the RAID 4 mechanism
Disc N only stores parity information for the other N - 1 discs
How is parity updated when blocks are written?
By additive parity: where, after the update of a disc all the other contents of the other discs must be read in order to update the parity block
Or by subtractive parity where you use The old data value, the new data value, and the old parity value to calculate the new parity value
What is the analysis of RAID 4
Capacity: total amount of discs minus the parity disc
Reliability: one disc can fail
Sequential read and write: we can parallelize across all non-parity blocks in the stripe: (N-1) * S
Random read: Can be parallelized over all, but the parity disc: (N-1) * R
Random writes: since the parity disc has to be updated after every write, you have to realize one read and one write in the parity disc: R/2
What is the analysis of RAID 5?
Capacity:
[same as RAID 4] • N–1
Reliability:
[same as RAID 4]
• 1 drive can fail
Sequential Read and write:
• (N–1)*S[same]
• Parallelization across all non-parity blocks
Random Read:
• NR [vs.(N–1)R]
• Unlike RAID 4, reads parallelize over all drives
Random Write:
• (N/4)*R[vs.R/2 for RAID 4]
• Unlike RAID 4, writes parallelize over all drives
• Each write requires 2 reads and 2 write, hence N / 4
Compare all the RAID levels
See picture 4S in Comp Infra
Describe the RAID 6
More fault tolerance with respect RAID5
2 concurrent failures are tolerated
Uses Solomon-Reeds codes with two redundancy schemes
• (P+Q)distributedandindependent
N + 2 disks required
High overhead for writes (computation of parities)
• each write require 6 disk accesses due to the need to update both the P and Q parity blocks (slow writes)
Minimum set of 4 data disks
Best performance and most capacity?
Greatest error recovery?
Balance between space, performance, and recoverability?
Best performance and most capacity? -> RAID 0
Greatest error recovery? -> RAID 1 (1+0 better than 0+1) or RAID 6
Balance between space, performance, and recoverability? -> RAID 5