Transcript
RAID Storage — Speed and Security Most video editing configurations will choose RAID over single drives. Single drives are capable of decent throughput — around 60 MBytes/sec (480 Mbits/sec, just faster than FireWire 400 and around the speed of USB 2.0). But for comfortable, multi-channel editing, a single drive does not have enough throughput for comfortable overhead or multiple streams of video. A RAID — Redundant Array of Independent (or Inexpensive, depending who you ask) Disks — is a method for providing faster throughput than single drives. It can also be used for providing access to larger pools of storage than single drives, as a result the RAID appears as one logical drive to the computer. The third advantage of using a RAID is that it can be set up with some degree of redundancy. Redundancy means that one of the drives can fail, without losing any data. Redundancy is available in RAID 1, 3, 5 and 6. There is a detailed explanation of RAID at Wikipedia for those who want to get further into the technicalities. Here I’m going to summarize the important issues and make some recommendations, without being too detailed or technical.
W h y i s Redunda nc y I m po r t a n t Redundancy gives us the ability to have a drive fail without losing valuable data. How valuable is this? Well, a client recently went through the loss of a 3TB RAID 0 on Mac OS X, and it cost over $8,000 to recover the data. Like a lot of modern, data-centric workflows, this footage was not available on tape as it had been collected over many years and archived. With data-centric (or IT-centric) workflows, there is no “tape” to rely on in case of disaster; therefore we need to have multiple copies of our data, preferably in multiple locations. It is also preferable to have any long-term storage happen on RAID 5 or RAID 6 systems, so that one (or two) drives can be lost without losing the data.
U se f u l Lev els of RA I D Disks that are going to make up a RAID are usually housed in a common enclosure, or completely internally inside the host computer.7 In fact, without being configured (a.k.a. striped) as a RAID, these box-contained drives would go by the acronym JBOD: Just a Bunch Of Disks! In a JBOD, the drives share an enclosure with power supply and fans, and probably a common connection to the computer, but each drive shows as a volume on the desktop with no speed improvement or redundancy. They are indeed just a bunch of independent disks. There are also differences between software-based RAIDs and those reliant on a hardware controller. That discussion follows.
7 Apple’s 2008 series of Mac Pro computers will take 3 additional SATA drives that simply slide into the cradle and use the available power and data connectors. The three drives (excluding the boot drive) can be striped into a RAID 0 configuration with up to three 1TB drives. Some users have been know to add an additional drive to the spare optical drive bay for the system (a.k.a. boot) drive and using four drives in a RAID 0 stripe. With Apple’s optional RAID card for the Mac Pro, these three or four additional drives can be striped as RAID 1, RAID 3 or RAID 5 to provide some redundancy. 106
RAID 0
RAID 0 No Redundancy
RAID 0 provides no redundancy but increases speed. Two or more drives are striped together to make one logical drive. The controller splits incoming data to each of the drives, so each drive has more time to write the file, increasing throughput. Think of it like a card dealer dealing to multiple players: first card to first player, second card to second player, third card to third player, fourth card to first player (assuming three players), and so on. The more drives there are in a RAID 0 configuration, the faster the results. There is no lost capacity in a RAID 0 configuration.
A1
B1
A2
B2
A3
B3
A4
B4
Drive 1
IMPORTANT: If any drive in a RAID 0 configuration fails, all
Drive 2
the data on all the drives will be lost. RAID 0 is supported at the system level in Mac OS X and Windows without additional hardware or software.
RAID 1 Drive 1 is mirrored to Drive 2
RAID 1 RAID 1 takes two drives and mirrors the contents of one to the other making a duplicate (or backup) copy. One of the drives can fail and the other drive will still have all the content. In a RAID 1 configuration, replacing a failed drive results in the contents of the remaining drive being automatically copied to the new drive, providing redundancy only after the failed drive has been replaced and has rebuilt from the remaining drive. RAID 1 reduces capacity to half that of the native drives. Two 500 GB drives in a RAID 1 configuration will have slightly under 500 GB of storage available, because all data is being stored twice. 107
A1
A1
A2
A2
A3
A3
A4
A4
Drive 1
Drive 2
RAID 1 has improved read speed over single drives but no speed increase on writing. RAID 1 is used predominantly when redundancy is required and only two drives are available for the configuration. RAID 1 requires a dedicated housing and are usually configured in hardware.
RAID 3 There is a RAID 2 but it’s not used for video work. RAID 3 is configured with at least three drives: two drives carry the data and the third drive carries “parity” information. Parity information creates a cross check to ensure that data is not damaged. This built-in error checking information is used to detect errors. RAID 3 provides some speed increase, although not as fast as three drives in a single RAID 0 stripe, with the advantage that one drive can be lost and yet all data will be intact. The dedicated parity drive becomes a bottleneck during writing, making a RAID 3 configuration slower to write to than RAID 0 or 1.
RAID 3 Parity on separate disk
Block 1 A
Block 1 B
Block 1 C
Parity
Block 2 A
Block 2 B
Block 2 C
Parity
Block 3 A
Block 3 B
Block 3 C
Parity
Block 4 A
Block 4 B
Block 4 C
Parity
Drive 1
Drive 2
Drive 3
108
Drive 4
RAID 5 The configuration used most in video post-production where there needs to be a combination of speed and redundancy. Instead of a dedicated disk for parity, causing a bottleneck during disk write, the parity information is distributed across all the drives in the RAID 5 configuration. The disk write is slower than RAID 0 but not as slow as RAID 3. RAID 5 requires at least three disks, but is much more commonly used with five drives in the array. While it does not store redundant information, therefore only losing the equivalent of one drive’s capacity (or 20% in a five drive configuration) the data can be rebuilt from the parity information, so a RAID 5 system will tolerate the failure of one drive without losing data. RAID 5 configurations can be slow to rebuild after a failed drive is replaced (24 or more hours depending on the size of the array) during which time the system is vulnerable to another drive failure. A second failure before the replacement drive has rebuilt, will result in lost data. The chances of losing two drives within 30 or so hours is low, but those with the most valuable data will want to consider RAID 6. RAID 5 is slower to write than to read, so it is very suitable for multi-user systems with video data, which needs to be read many more times than it is written. Note: RAID 5 requires that all drives be the same size. If drives of different capacity are used, the “common size” will be used. For example if a RAID 5 array was built from four 500 GB drives and one 750 GB drive, the 750 GB drive will be considered to be 500 GB and 250 GB of capacity will be unavailable.
RAID 5 Parity spread across all drives
Block 1 A
Block 1 B
Block 1 C
1 Parity
Block 2 A
Block 2 B
2 Parity
Block 2C
Block 3 A
3 Parity
Block 3 B
Block 3C
4 Parity
Block 4A
Block 4B
Block 4C
Drive 1
Drive 2
Drive 3
109
Drive 4
RAID 6 Very similar to RAID 5 but requires at least five drives. The parity information is written across multiple drives, which allows two drives to fail at once, without losing data. RAID 6 is recommended for the highest level of fault and drive-failure tolerance. Speeds are similar to RAID 5, but a five drive RAID 6 will have approximately 20% less storage space than the same stripe configured as RAID 5.
RAID 50, 60 A RAID 50, also written RAID 5+0 takes two RAID 5 configurations and stripes them as a RAID 0 array for the speed increase. Within each RAID 5 array, one drive can be lost without losing data. If more than one drive is lost in either RAID 5 array, the entire combined array’s data will be lost. This configuration is necessary to get appropriate speed for uncompressed 10-bit HD. RAID 60, as I’m sure you’ve guessed, is two RAID 6 arrays, striped in a RAID 0 configuration to increase speed. RAID 6’s fault tolerance (two drives per array) continues into the RAID 60 configuration. RAID 50 or RAID 60 is necessary for working with uncompressed 10-bit HD.
Summary In summary, RAID 0 for speed but no redundancy; RAID 5 for speed and “usual” levels of redundancy and protection against data loss; and RAID 6 for those paranoid about data loss. RAID 3 is less common in video post-production. There are excellent units in all of these configurations, across multiple types of interfaces from GMax, CalDigit, Ciprico, Dulce Systems, G-Tech and others.
110