RAID Setup in Linux Flashcards
RAID stands for?
Was Redundant Array of Inexpensive disks
but now
Redundant Array of Independent drives
RAID is a collection of disks in a pool to become a logical volume
Logical volume
A logical disk, logical volume or virtual disk (VD[1] or vdisk[2] for short) is a virtual device that provides an area of usable storage capacity on one or more physical disk drive(s) in a computer system.
The disk is described as logical or virtual because it does not actually exist as a single physical entity in its own right.
The goal of the logical disk is to provide computer software with what seems a contiguous storage area, sparing them the burden of dealing with the intricacies of storing files on multiple physical units.
Most modern operating systems provide some form of logical volume management.
https://en.wikipedia.org/wiki/Logical_disk
A combine of drivers make a group of disks to form a RAID ____ or RAID ____
RAID contains groups or sets or Arrays
Explain the concept of parity
Parity method in raid regenerate the lost content from parity saved information’s. RAID 5, RAID 6 Based on Parity
Explain the concept of striping
In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.
An example of data striping. Files A and B, of four blocks each are spread over disks D1 to D3.
Striping is useful when a processing device requests data more quickly than a single storage device can provide it. By spreading segments across multiple devices which can be accessed concurrently, total data throughput is increased. It is also a useful method for balancing I/O load across an array of disks. Striping is used across disk drives in redundant array of independent disks (RAID) storage, network interface controllers, disk arrays, different computers in clustered file systems and grid-oriented storage, and RAM in some systems.
Explain the concept of mirroring
Mirroring is used in RAID 1 and RAID 10. Mirroring is making a copy of same data. In RAID 1 it will save the same content to the other disks in the set/array
Explain the concept of Hot Spare
Hot spare is used in an array to automatically replace a failed drive. If any one of the drives fail in our array this hot spare drive will be used and rebuild automatically.
Explain the concept of Chunks
Chunks are just a size of data which can be minimum from 4KB and more. By defining chunk size we can increase the I/O performance
RAID 0
(also known as a stripe set or striped volume) splits (“stripes”) data evenly across two or more disks, without parity information, redundancy, or fault tolerance.
https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0
RAID 0 benefits
This configuration is typically implemented having speed as the intended goal.[2][3] RAID 0 is normally used to increase performance, although it can also be used as a way to create a large logical volume out of two or more physical disks.
A RAID 0 array of n drives provides data read and write transfer rates up to n times as high as the individual drive rates, but with no data redundancy. As a result, RAID 0 is primarily used in applications that require high performance and are able to tolerate lower reliability, such as in scientific computing[5] or computer gaming.[6]
Some benchmarks of desktop applications show RAID 0 performance to be marginally better than a single drive.[7][8] Another article examined these claims and concluded that “striping does not always increase performance (in certain situations it will actually be slower than a non-RAID setup), but in most situations it will yield a significant improvement in performance”.[9][10] Synthetic benchmarks show different levels of performance improvements when multiple HDDs or SSDs are used in a RAID 0 setup, compared with single-drive performance. However, some synthetic benchmarks also show a drop in performance for the same comparison
RAID 0 Drawbacks
Since RAID 0 provides no fault tolerance or redundancy, the failure of one drive will cause the entire array to fail; as a result of having data striped across all disks, the failure will result in total data loss.
Another article examined these claims and concluded that “striping does not always increase performance (in certain situations it will actually be slower than a non-RAID setup), but in most situations it will yield a significant improvement in performance”.[9][10] Synthetic benchmarks show different levels of performance improvements when multiple HDDs or SSDs are used in a RAID 0 setup, compared with single-drive performance. However, some synthetic benchmarks also show a drop in performance for the same comparison
RAID 1
consists of an exact copy (or mirror) of a set of data on two or more disks; a classic RAID 1 mirrored pair contains two disks.
RAID 1 benefits
This layout is useful when read performance or reliability is more important than write performance or the resulting data storage capacity.[13][14]
The array will continue to operate so long as at least one member drive is operational.
Any read request can be serviced and handled by any drive in the array; thus, depending on the nature of I/O load, random read performance of a RAID 1 array may equal up to the sum of each member’s performance,[a] while the write performance remains at the level of a single disk. However, if disks with different speeds are used in a RAID 1 array, overall write performance is equal to the speed of the slowest disk
RAID 1 drawbacks
This configuration offers no parity, striping, or spanning of disk space across multiple disks, since the data is mirrored on all disks belonging to the array, and the array can only be as big as the smallest member disk.
overall write performance is equal to the speed of the slowest disk
RAID 2
which is rarely used in practice, stripes data at the bit (rather than block) level, and uses a Hamming code for error correction. The disks are synchronized by the controller to spin at the same angular orientation (they reach index at the same time[16]), so it generally cannot service multiple requests simultaneously.[17][18]
RAID 2 benefits
Depending on the high rate Hamming code, many spindles would operate in parallel to simultaneously transfer data so that “very high data transfer rates” are possible[19] as for example in the DataVault where 32 data bits were transmitted simultaneously.
RAID 2 drawbacks
The disks are synchronized by the controller to spin at the same angular orientation (they reach index at the same time[16]), so it generally cannot service multiple requests simultaneously.[17][18]
With all hard disk drives implementing internal error correction, the complexity of an external Hamming code offered little advantage over parity so RAID 2 has been rarely implemented; it is the only original level of RAID that is not currently used.[17][18]
RAID 3
RAID 3, which is rarely used in practice, consists of byte-level striping with a dedicated parity disk.
The requirement that all disks spin synchronously (in a lockstep) added design considerations that provided no significant advantages over other RAID levels. Both RAID 3 and RAID 4 were quickly replaced by RAID 5.[20] RAID 3 was usually implemented in hardware, and the performance issues were addressed by using large disk caches.[18]
RAID 3 benefits
This makes it suitable for applications that demand the highest transfer rates in long sequential reads and writes, for example uncompressed video editing. Applications that make small reads and writes from random disk locations will get the worst performance out of this level.[18]
RAID 3 drawbacks
One of the characteristics of RAID 3 is that it generally cannot service multiple requests simultaneously, which happens because any single block of data will, by definition, be spread across all members of the set and will reside in the same physical location on each disk. Therefore, any I/O operation requires activity on every disk and usually requires synchronized spindles.
The requirement that all disks spin synchronously (in a lockstep) added design considerations that provided no significant advantages over other RAID levels. Both RAID 3 and RAID 4 were quickly replaced by RAID 5.[20] RAID 3 was usually implemented in hardware, and the performance issues were addressed by using large disk caches.[18]
RAID 4
Diagram: where “.” equates to parity
Groups| Device #1 | Device #2 | Device #3 | Device #4 |
——————————————————
1 | A1 | A2 | A3 | A. |
2 | B1 | B2 | B3 | B. |
3 | C1 | C2 | C3 | C. |
4 | D1 | D2 | D3 | D. |
RAID 4 consists of block-level striping with a dedicated parity disk.
In diagram 1, a read request for block A1 would be serviced by disk 0. A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1.
RAID 4 benefits
As a result of its layout, RAID 4 provides good performance of random reads, while the performance of random writes is low due to the need to write all parity data to a single disk.[21]
RAID 4 drawbacks
Diagram: where “.” equates to parity
Groups| Device #1 | Device #2 | Device #3 | Device #4 |
——————————————————
1 | A1 | A2 | A3 | A. |
2 | B1 | B2 | B3 | B. |
3 | C1 | C2 | C3 | C. |
4 | D1 | D2 | D3 | D. |
In diagram 1, a read request for block A1 would be serviced by disk 0. A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1.
RAID 5
RAID 5 consists of block-level striping with distributed parity. Unlike in RAID 4, parity information is distributed among the drives. It requires that all drives but one be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data are lost.[5] RAID 5 requires at least three disks.[22]