RAID Made Easy
- — 17 April, 2010 00:22
What is RAID, why do you need it, and what are all those mode numbers that are constantly bandied about? RAID stands for "redundant array of independent disks," and you may or may not need it depending on your data-storage requirements.
The biggest gain from using RAID is protection against drive failure--which, according to Google and other experts, happens a lot more often than hard-drive manufacturers like to admit. (Note that the word array is included in the acronym, so saying "RAID array," as a lot of people do, is redundant. Clearly, storage folks have a strange sense of humor.)
In the old days, when the fastest and largest hard drives carried a very heavy premium (faster drives still do, though not nearly to the same degree), RAID was created to combine multiple, less-expensive drives into a single, higher-capacity and/or faster volume. Redundancy, also known as fault tolerance or failover protection, was included so that the loss of one drive wouldn't render an entire array and its data useless.
As such, RAID has several levels, or methods by which the drives are ganged together, with data distributed across the drives. The RAID levels are commonly referred to by number. The three most common levels in the consumer and small-office markets are RAID 0, RAID 1, and RAID 5, which I'll cover first along with other common options such as JBOD ("just a bunch of disks"), Microsoft's RAID-like Drive Extender, and RAID-virtualization technologies such as those from Drobo, Netgear, Synology, and Seagate.
Most RAID modes don't require that you employ drives of equal size, but they'll use only the capacity on each drive that equals the capacity of the smallest drive in the array--that is, if you mix a 500GB drive with a 1TB drive, the setup will treat both as 500GB drives.
Keep in mind that RAID's data redundancy is a hedge only against data loss due to drive failure, and a way to keep you working until you can replace the bad drive. RAID offers no protection against data lost to malware, theft, or natural disaster, and it's certainly no substitute for proper backup practices.
Common RAID Modes
Picture the 0 in "RAID 0" as a race track, and you're on to its primary purpose--speed. RAID 0 distributes data across multiple drives (for example, block A goes to/from drive 1, block B goes to/from drive 2), which permits the increased write and read speeds. This approach is sometimes referred to as striping, and other modes (as you'll see later) employ the technique as well.
Always remember, however, that RAID 0 offers zero protection against drive failure, as no duplicate or parity information is written. Hence, when a drive fails, you end up with a puzzle that's missing pieces. In such a situation, your data is lost, unless of course you want to spend a very large sum of money trying to recover it.
RAID 1 creates a mirror of your data across two drives--that is, the array writes and reads the very same data to pairs of drives. The drives are equal partners; should either fail, you can continue working with the good one until you can replace the bad one. RAID 1 is the simplest, easiest method to create a failover disk storage subsystem. It costs you a whopping 50 percent of your total available drive capacity, however--for example, two 1TB drives in a mirrored array nets you 1TB of usable space, not 2TB.
You may have as many pairs of mirrored drives as your RAID controller allows. And in the unlikely event that said consumer-grade data traffic cop supports duplex reading, RAID 1 can provide an increase in read speeds by fetching blocks alternately from each drive.
This RAID mode offers both speed and data redundancy for light to medium use (home offices or small to medium-size businesses). RAID 5 writes data to and reads from multiple disks, and distributes parity data across all the disks in the array. Parity data is a smaller amount of data derived mathematically from a larger set that can accurately describe that larger amount of data, and thus can be used to restore it.
RAID 5 uses approximately one-third of the available space for parity information, and requires a minimum of three disks to implement. Its reading from multiple disks means that it's pretty fast compared with consumer setups that might process one or two reads simultaneously, though performance can suffer greatly when it's processing multiple reads in a server situation. Also, since parity information is distributed across all the drives, any drive can fail without causing the entire array to fail.
From one perspective, JBOD--or "just a bunch of disks"--might be considered an array. But JBOD offers no speed increase or redundancy. Its sole purpose is to concatenate a group of disks into a single volume. Data writes to the first drive until it's full, then to the second until it's full, and so on, until the last drive has no more room. Even though many network-attached storage devices offer this option, we don't recommend it unless it's the only thing available, you really need a single large volume, and you don't have the choice of using RAID 0 (an unlikely circumstance). It's hard to envision that particular need, given today's massively capacious, 2TB drives.
This RAID alternative is employed on NAS boxes running Microsoft Windows Home Server. Unlike RAID, which can be embedded in hardware controllers or implemented via software and works at the bit, byte, or block level, Drive Extender works at the file level of your OS.
This approach means that the CPU has to be involved, and on a workstation it can result in a sizable performance hit. On a NAS device with nothing else to do, the impact will be minimal. In fact, Windows Home Server NAS boxes are generally very good performers, and Drive Extender offers a number of advantages. You may mix internal PATA and SATA drives, external USB and FireWire storage, and the like. There's no need to match drive size, either, and you can add more disks without rebuilding an array. Finally, you may also remove drives if doing so doesn't reduce the total capacity below the quantity of the data stored on the system.
Drive Extender provides fault tolerance by storing files on separate disks, though this function is generally disabled by default. The system also allows you to configure which data will be replicated on a folder-by-folder basis, so you can omit noncritical data and minimize duplication's impact on total capacity.
Drobo has made a name for itself by simplifying the RAID configuration process, as well as by allowing users to employ the full capacity of different-size drives in a single multibay array. With a Drobo device, you can insert any drive you want, and the box automatically configures itself. RAID is still involved, but it's in concept, and transparent to the user; in fact, the virtualization abstracts the data-redundancy process, and can use the equivalent of several RAID levels simultaneously.
Other drive makers, such as Netgear, Seagate, and Synology, have also begun to offer similar virtualized redundancy features, each marketed under a different name. If this trend continues, understanding the different RAID levels may be unnecessary in the future.
Other RAID Options
The RAID specifications include several other levels that are not commonly used anymore.
I should just skip RAID 2, as nobody uses it, but for the sake of completion, I'll note that this RAID level distributes data across multiple drives at the bit level (the smallest unit of computer information with a value or either 0 or 1) instead of at the block level. This setup writes Hamming ECC (error-correcting code) recovery information to dedicated parity disks at the byte level, and it's comparatively slow at doing so.
RAID 3 is another mode that got kicked off the consumer island because it doesn't use data blocks; it distributes data across multiple drives as bytes (8 bits), and stores parity information on a dedicated drive. RAID 4 fell into disuse because it distributes data across multiple drives as blocks and stores all parity information on dedicated parity drives; if a dedicated parity drive fails, the entire array is unprotected until it's replaced.
Three more RAID options can be useful but aren't often found on consumer network-attached storage devices (though some business-oriented NAS boxes may have these features). RAID 6 is very much like RAID 5; both have distributed parity info, but RAID 6 has even more than RAID 5. With RAID 6, a second set of parity information is distributed across drives--to the obvious detriment of total capacity. Nevertheless, in situations where the highest level of fault tolerance is required, RAID 6 is highly recommended.
RAID 10, also referred to as RAID 1+0, stripes data (RAID 0) across mirrored pairs (RAID 1) of drives. With this arrangement, you get back some of the write speed that RAID 1 costs you; but you need at least four drives to implement this scheme, and 50 percent of the total drive capacity is devoted to redundancy.
Conversely, RAID 0+1 mirrors (RAID 1) striped pairs (RAID 0) of drives. As with RAID 10, you regain some of the write speed that RAID 1 costs you. Again, you need at least four drives, and you spend 50 percent of the total drive capacity on redundancy.
Choosing RAID: A Cheat Sheet
Here are a few tips to keep in mind when selecting the RAID setup that's right for you.
- Use hardware RAID by enabling it in your motherboard's BIOS or by buying a controller. Software RAID reduces overall workstation performance.
- Use RAID 0 when all you want is faster performance with large files.
- Use RAID 1 when you have only two drives and you want to protect against drive failure
- Use RAID 5 when you have more than two drives and you want a hedge against drive failure.
- Windows Drive Extender is perfectly valid but is generally available only on NAS boxes. It will affect performance on a PC that's used for applications.