RAIDS Theory Flashcards

1
Q

What are RAIDs?

A

They are redundant arrays of independent discs

In other words, it is a group of independent discs that are considered as a single, large, high-performance logical disc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are RAIDs for??

A

Increase the performance, the size and the reliability of storage systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the consequences of stripping data across several discs?

A

Higher data, transfer rate, higher I/O, a need for load balancing across the disks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two octagonal techniques implemented in RAID?

A

Data striping, to improve performance

Redundancy, to improve reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data striping?

A

Data striping is a method used to improve the performance and throughput of storage by distributing data across multiple disks. Here’s a more detailed explanation:

Definition: Data striping involves dividing a body of data into smaller blocks and spreading these blocks across multiple physical disks in a RAID array. This technique enhances the read and write speeds by allowing multiple disks to operate simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does data stripe works?

A

How It Works:
- Data is broken down into chunks or stripes.
- Each chunk is written to a different disk in the array.
- When a file is read, the system can read different parts of the file from multiple disks simultaneously, speeding up the process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a stripe unit and a stripe width?

A

A stripe unit is the dimension of the unit of data that are written on a single disk

The stripe width this is the number of discs considered by the striping algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the main motivation for the introduction of redundancy in the RAIDs?

A

The fact that the more physical discs the larger the probability of failure of a disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main drawback involving redundancy?

A

Since right operations must update also the redundant information, their performance is worse than the one of the traditional writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the orthogonal techniques present in each RAID type

A

RAID 0: stripping only
RAID 1: mirroring only
RAID 0+1: nested levels
RAID 1+0: nested levels
RAID 4: block interleaving (redundancy, parity disc)
RAID 5: block interleaving (redundancy, distributed parity disc)
RAID 6: greater redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the RAID 0

A

Data are written to a single logical disc and split in several blocks distributed across the disk according to a stripping algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the primary concerns for the RAID 0

A

Performance and capacity, rather than reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages and disadvantages of the RAID 0

A

Lower cost (it does not employ redundancy)

Best right performance (it does not need to update redundant data, and it is paralyzed.)

The drawback is that a single disc failure will result in data loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How would a RAID 0, four disc array with two stripes be organized, and what would be its capacity?

A

Capacity of 4 physical disks

        Disk 0 Disk 1Disk 2 Disk 3 Stripe1|B1 |  B2.   |. B3.|.   B4 Stripe2|B5 |. B6.   |. B7.|    B8
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the impacts of the chunk size in the disc array performance?

A

Smaller chunks leads to greater parallelism

Bigger chunks reduce seek time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Analyze the RAID 0

A

Capacity is equal to the number of discs

There is no reliability, if any drive fails Data is permanently lost. These means that the meantime to data loss (MTTDL) is equal to the meantime to failure (MTTF)

Sequential and Rand read and write operations can be fully parallelized

17
Q

What is the RAID 1key idea?

A

Make two copies of all data

18
Q

What are the advantages and disadvantages of the RAID 1

A

Advantages: High reliability, read of data (data can be retrieved from the short queue disc), first right (no error correcting code should be computed)

Disadvantages: high cost (50% of the capacity)

19
Q

What is the description of RAID 0+1

A

Mirroring first, then striping

20
Q

What is the description of the RAID 1+0

A

Stripping first then mirroring

21
Q

What is the main difference between RAID 0+1 and RAID 1+0

A

RAID 0+1 Fault tolerance is less

RAID 1+0 Fault tolerance is larger

22
Q

What are the capacities and reliabilities of RAID 1

A

The capacity is equal to the number of disks divided by two

And the reliability is in the worst case One disc, but if you’re lucky the number of discs divided by two discs can fail

23
Q

What are the sequential writte and read capabilities of RAID 1

A

Since half of the disks are used for copied data, there is half of the throughout, what is equal to N/2 * S

24
Q

What are the capacities and reliabilities of RAID 0+1 and RAID 1+0

A
25
Q

What are the sequential writte and read capabilities of RAID 0+1 and RAID 1+0

A
26
Q

What are the random read and write capacities of RAID 1

A

A random read can be done in parallel across all discs, and therefore is equal to N * R

A write, otherwise, needs to be replicated to all discs, which results in half of the disc performance: N/2 * R

27
Q

What is the difficulty of guaranteeing atomic mirrored right? And what the RAID controllers include in order to mitigate it.?

A

It is difficult to guarantee (ex: power failure). This way many RAIDs controllers include a write-ahead log, which is a battery backed, non-volatile storage of pending rights

28
Q

Describe the RAID 4 mechanism

A

Disc N only stores parity information for the other N - 1 discs

29
Q

How is parity updated when blocks are written?

A

By additive parity: where, after the update of a disc all the other contents of the other discs must be read in order to update the parity block

Or by subtractive parity where you use The old data value, the new data value, and the old parity value to calculate the new parity value

30
Q

What is the analysis of RAID 4

A

Capacity: total amount of discs minus the parity disc

Reliability: one disc can fail

Sequential read and write: we can parallelize across all non-parity blocks in the stripe: (N-1) * S

Random read: Can be parallelized over all, but the parity disc: (N-1) * R

Random writes: since the parity disc has to be updated after every write, you have to realize one read and one write in the parity disc: R/2

31
Q

What is the analysis of RAID 5?

A

Capacity:
[same as RAID 4] • N–1

Reliability:
[same as RAID 4]
• 1 drive can fail

Sequential Read and write:
• (N–1)*S[same]
• Parallelization across all non-parity blocks

Random Read:
• NR [vs.(N–1)R]
• Unlike RAID 4, reads parallelize over all drives

Random Write:
• (N/4)*R[vs.R/2 for RAID 4]
• Unlike RAID 4, writes parallelize over all drives
• Each write requires 2 reads and 2 write, hence N / 4

32
Q

Compare all the RAID levels

A

See picture 4S in Comp Infra

33
Q

Describe the RAID 6

A

More fault tolerance with respect RAID5

2 concurrent failures are tolerated

Uses Solomon-Reeds codes with two redundancy schemes
• (P+Q)distributedandindependent
N + 2 disks required

High overhead for writes (computation of parities)
• each write require 6 disk accesses due to the need to update both the P and Q parity blocks (slow writes)
Minimum set of 4 data disks

34
Q

Best performance and most capacity?
Greatest error recovery?
Balance between space, performance, and recoverability?

A

Best performance and most capacity? -> RAID 0

Greatest error recovery? -> RAID 1 (1+0 better than 0+1) or RAID 6

Balance between space, performance, and recoverability? -> RAID 5