RAID in computing is synonymous with storage. Originally the term referred to a redundant array of inexpensive disks ( Redundant Array of Inexpensive Disks). Later, the acronym was updated. Today, the definition refers to the Redundant Array of Independent Disks, a redundant series of independent disks. But as experts point out, the purpose of the technology hasn't changed.
Is RAID Interesting for the Mac User? A lot, because the space requirements continue to increase, and with it the need to record data in multiple disks, in order to avoid complete loss in case of failure: often many commercial solutions offer a pre-packaged RAID system, and precisely for this is useful to understand its nature to adapt it to our needs better.
Let's start with the Basics
Before talking in detail about RAID, Types, and redundancy, it is better to clarify some basic concepts: a disk is a hardware device born to store data and can be used in various ways, but no operating system uses the disk directly; it does it using the "Volumes", where documents and folders are registered.
The difference between disks and volumes is very abstract and, above all, fleeting but fundamental: a volume does not have a correspondence in a disk because it can be the sum of several disks, but also a single part, since a disk can contain several volumes.
So how do we understand each other? It is enough to look at the question from an elementary point of view: the disks are a physical element, the volumes are decided by the user, in complete freedom, according to their needs and the characteristics of the operating system.
What is Raid and How to Use it?
The acronym RAID is an acronym under which various techniques are hidden, mostly designed to manage a series of storage disks in a single volume elastically and securely: the acronym, which stands for "Redundant Matrice of Independent Disks", is it can understandably translate as a set of independent disks into a redundant volume although, as redundancy is not strictly mandatory.
It is a very common method of protecting application data on both hard disk drives and solid-state storage. Speaking for the first time about this technique were David A. Patterson, Garth A. Gibson, and Randy H. Katz in an article published in the proceedings of the 1988 SIGMOD Conference and entitled "A Case for Redundant Arrays of Inexpensive Disks".
The original idea combines a series of low-cost disks to optimize the system regarding capacity, reliability, and speed concerning a next-generation disc.
Confused? It's simpler than it looks.
When you buy a Mac, inside you will typically find only one disk (apart from the Mac Pro, which for the moment we leave it alone) which after the initial installation brings back two volumes inside, the one you use (and see) by default and a second volume, which macOS calls "recovery" and which is used to initialize the first in case of need, typically hidden.
When we use an external USB disk, for example, we see the volume, which corresponds to the disk. But thanks to Disk Utility, macOS can "partition" the disk to create other volumes, a useful function for example, when we want to start the Mac with Windows, using BootCamp.
RAID does the same thing, but on the contrary: instead of creating multiple volumes inside a disk, it puts multiple disks inside a volume, it takes a bit of abstraction, but concretely the process is simple and very interesting.
Several types balance the level of protection based on their price: in summary, the higher the protection, the higher the cost:
- The cheapest drives are IDE (Integrated Drive Electronics),
- ATA (Advanced Technology Attachment) or SATA ( Serial Advanced Technology Attachment )
- The most expensive drives are Small Computer System Interface (SCSI)
Redundancy And Parallelism: How a Raid System Works
By grouping individual physical drives to form a set, RAID represents all of these physical drives as one logical disk on the server. This logical disk is called a Logical Unit Number or LUN. The data, therefore, are partitioned into sections (in English: stripes) of equal length and transcribed on different disks using an algorithm in charge of distribution.
When a larger read size than the slicer unit is required, this technology distributes the workload across multiple disks in parallel, thereby increasing performance. The improvements made to the performance and availability of RAID have confirmed this approach over the years, even in the face of the availability of new storage technologies.
For navigating the various types of levels that characterize the techniques in this area, it is necessary to understand in detail the set of characteristics (minimum number of discs, capacity, maximum number of defects allowed on the disc) and, of each level, to understand advantages and disadvantages.
Also, because of the traditional original five levels, with the evolution of storage systems, the number of RAID levels has increased.
What are the Benefits
The main advantage of using such a system is the ability to retain data stored on failed drives.
- The data mirroring, that is, when data is written on more than one disk simultaneously
- The striping, or when the data are distributed on the unit block.
- The equality or a method that allows obtaining the security that the data have been correctly written when moving from one unit to another.
- A combination of these techniques.
What is Equality and What is it For
While data mirroring and striping are popular technologies, parity is a term for people in the industry. More specifically, it is a checksum of the data written to the disks, which is reported together with the original data.
The checksum value itself is typically a long series of letters and numbers that act as a fingerprint for a file or set of files to indicate the number of bits included in the transmission.
The server accessing data on a hardware-based RAID set does not know if and which of the drives in the set may have failed. With parity, the controller recreates data lost when the drive fails using the parity information stored on the surviving disks in the set. In most cases, increasing performance or reliability increases the cost of protecting data on drives.
Standard and Non-standard Levels
A large number of RAID levels can be divided into three macro-categories:
- not standard
The standard levels consist of the basic types numbered 0 to 6. A non-standard level is set to the standards of a particular company or associated with an open-source project. Non Standards include RAID 7, adaptive, S, and Linux MD RAID 10. The RAID nested refers to combinations of these RAID levels, such as 01 - 0 + 1, 03-0 + 3 and 50-5 + 0.
What is the Best Raid for Your Organization
As experts explain, the best RAID for an organization depends on:
- The level of redundancy you are looking for
- The length of the retention period
- The number of discs you are working with
- The importance attached to data protection over performance optimization
To choose which level to use, you must first evaluate what type of application runs on the server. To understand better: RAID 0 is the fastest, RAID 1 is the most reliable, RAID 5 is a good squeeze between RAID level 0 and RAID level 1.
Below is a description of the different tiers most commonly used in storage arrays. Not all storage array vendors support every type of RAID, so you should first check with your vendors for available types and their memory.
Level 0 RAID: It simply corresponds to a disk partition. All data is distributed in chunks across all disks in the set.
PRO: Offers great performance as the data storage load is spread across multiple physical drives. It also has the lowest cost of all RAID types because it uses disk space to store data. Since no parity has been generated for RAID 0, there is no overhead to write data to RAID 0 disks.
CONS: However, this level has the worst data protection of all levels. When a disc fails, the data on that disc is unavailable until another drive can rewrite it.
Level 1 RAID: It is disk mirroring, which means that all data is written to two separate physical disks.
PROS: Discs are essentially mirror images of each other. If one drive fails, the other can be used to recover data.
CONS: Disk mirroring is useful for fast read operations, but write speeds are slower because data has to be overwritten twice. Another disadvantage of this level is that the amount of disk space also required doubles since all data is archived twice.
RAID 1 + 0: RAID 1 + 0, also called RAID 10, uses disk mirroring and striping. The data is normally mirrored first and then partitioned. Strip set mirroring performs the same task but is less fault-tolerant than partition mirror sets.
If you lose a drive in one stripe set, you need to access the data from the other stripe set because the partition sets have no parity. RAID 1 + 0 requires a minimum of four physical disks.
Level 2 RAID: Bit-level data partitions use a Hamming code. These days, Hamming codes are already used in hard drive error correction codes, so they are no longer used.
Level 3 RAID: It uses a parity disk to store information generated by a RAID controller on a separate disk from the actual data disks. This tier type requires a minimum of three physical disks.
PRO: This level works very well with applications that require a long sequential data transfer, such as video servers.
CONS: It works badly when there are many requests for data, such as a database management application.
Level 4 RAID: It uses a dedicated parity disk with a block-level partition mode between disks.
PRO: It is useful in case of sequential access to data
CONS: Using a dedicated parity disk can cause write performance bottlenecks.
Being able to use alternatives such as RAID 5, level 4 is not widely used.
Level 5 RAID: It combines disk striping and parity, requiring at least three physical disks. Data is partitioned across all disks in the RAID set and the parity information needed to rebuild the data in the event of a disk failure.
It is the most common method because it achieves a good relationship between performance and availability.
Level 6 RAID: Increase reliability by using two partition systems combined with parity, which allows you to manage up to two disk failures within the RAID set without data being lost.
It requires at least four disk drives and is often used for large-capacity drives, such as massive storage or disk-based backup processes. A big advantage of RAID 6 is that it allows data recovery in the event of simultaneous disk failures, with relatively longer rebuild times.
Adaptive: This allows the RAID controller to figure out how to store parity on the disks. You can choose between RAID 3 and RAID 5 based on the type of RAID set that best suits the type of data written on the disks.
Level 7 RAID: It is a non-standard level, based on RAID 3 and RAID 4, which requires proprietary hardware. This layer is owned by what was once called Storage Computer Corp.
Minimum Drives and Rebuilds for Raid Levels: 3 Things to Know
RAID requires multiple disk drives, the number of which varies according to the chosen RAID level. A frequently asked question is whether, once the minimum requirement is met, there is an advantage in adding more disks.
Using more than the minimum number of drives results in more available storage space and more actuators or spindles (spindle and disk motor) to support the operating system. However, this does not mean that this is necessary at all times.
Most RAID arrays use a maximum of 16 drives within a RAID set which causes greater overhead and decreasing performance returns when exceeding many drives. A good rule of thumb is to use up to 8 drives for RAID 5 and RAID 10.
As a further rule of thumb, experts recommend keeping different types of data on separate RAID sets. You can use RAID level 10 to get the best performance anywhere, but most budgets dictate the use of RAID 5 for database data volumes, with RAID 1 or RAID 10 used in the database log volumes. Database volumes can be very random I / O, while logs tend to be sequential.
Rebuild times depend on the type of RAID chosen: if you are using a software-based RAID, more spindles within the group result in longer rebuild times. If you are using hardware-based RAID, rebuild times are usually dictated by the size of the drives themselves, as hardware usually sparing in and out of the set.
This is why it is necessary to analyze the application context before choosing your reference level. For example, a 146GB drive takes longer to rebuild than a 73GB drive.
How it is Used Today
Observers from many quarters note that the need for RAID technology has diminished. Erase encoding, and solid-state drives present themselves as reliable, albeit more expensive, alternatives.
Not to mention that as storage capacity increases, the chance of RAID array failure also increases. However, large storage vendors continue to support RAID levels in their storage arrays.
Focus On Raid 10, What is it?
By combining two storage levels, i.e., levels 1 (mirroring, i.e., duplication of data on multiple hard disks) and 0 (striping, subdivision into blocks), Raid 10 is created, and thus a technique is exploited that optimizes performance, allowing the PC to use multiple discs at the same time. It is a nested configuration, i.e., with the characteristics of two different RAIDs, in this case, RAID 0 and RAID 1.
Specifically, at least four hard drives are required because two RAID 1s are joined together in a RAID 0. As you may have guessed, the total storage capacity is obtained from the sum of the capacities of the two smallest disks in each RAID 1. This solution favours both performance and security because it combines striping and mirroring and admits the failure of one hard drive per RAID 1 family.
There are many other types of RAID, but the ones we told you about are the most used, especially in the home or office. RAID is a technology that is of enormous importance in Network Attached Storage or NAS. As a result, hard drive models for NAS have been produced that allow RAID connections at different levels.