Different types of RAID protection

What is RAID? The concept of RAID, or Redundant Array of Independent Disks, was originally discussed in a Berkeley paper by Patterson, Gibson and Katz. The idea is that instead of writing data block by block over a single disk, the data is spread over several spindles. This gives performance benefits, as data is read off several spindles, and availability benefits, as extra parity data can be generated and stored, so that the data will still be available if one or more disks are lost.

Parity is a means of adding extra data, so that if one of the bits of data is deleted, it can be recreated from the parity. For example, suppose a binary halfword consists of the bits 1011. The total number of '1's in the halfword is odd, so we make the parity bit a 1. The halfword then becomes 10111. Suppose the third bit is lost, the halfword is then 10?11. We know from the last bit that there should be an odd number of '1's, the number of recognisable '1's is even, so the missing but must be a '1'. This is a very simplistic explanation, in practice, disk parity is calculated on blocks of data using XOR hardware functions. The advantage of parity is that it is possible to recover data from errors. The disadvantage is that more storage space is required. In enterprise disk subsystems, backup disks called 'dynamic spares' are kept ready, so that when a disk is lost, a dynamic spare disk is automatically swapped in and the faulty disk is rebuilt from the remaining data and the parity data.

There has been some speculation in recent years that RAID is no longer relevant. This is based on the fact that disks are now much bigger then they were when RAID was invented and so it takes much longer to swap a dynamic spare in. Why is this important? Well until the data is rebuilt there is no protection, so if another disk failed, all the data would be lost. With physical disks sizes of 16TB or more, that means in a RAID5 6+P+S configuration, 96 TB of data could be lost. In that same configuration, the disk controller would have to read 96 TB of data to rebuild the missing 16TB and that could take days if the system is busy. Also, while the controller is performing this recovery, performance will be affected when data on the rest of the raid group is accessed. Most enterprise vendors now insist on RAID6 or RAID1 for large disks.
On the plus side, RAID performance is evolving. Access to the disks does get faster as disks get bigger, and the hardware functions that are used to rebuild the data are constantly being improved. This means that as each new generation of disks arrive, the rebuild performance of RAID controllers improves and keeps pace with the drive capacity increase.
The conclusion is that RAID is not dead yet, nor is it likely to be for some time.


Accelerate DB2 Write with zHyperWrite and "EADM™ by Improving DB2 Logs Volumes Response Time:

So which RAID configuration is best? RAID1 is simple to implement, performs well and is probably the best solution for small configurations and especially home PCs. RAID6 is usually preferred for enterprise subsystems, especially if they use large disks.
RAID1 can only tolerate 1 disk failure, but as the RAID protection can be restored by reading just one disk, the risk of data loss is low, especially if the disks are relatively small. The issue with RAID 1 is that only half the installed capacity is usable.
RAID5 can also only tolerate one failure, and a rebuild can take some time for large disks, so increasing the chance that a second disk might fail and so lose all the data.
RAID6 can tolerate 2 disk failures, so when a disk fails, 2 more need to fail during rebuild time, before data is lost. The RAID overhead depends on how many disks are in the RAID rank. The overhead is 25% for an 8 disk array.

The various types of RAID are explained below. In the diagrams, the square box represents the controller and the cache. Blue and yellow blocks represent data and red blocks represent parity. For simplicity, the dynamic diagrams show each IO as a RAID block. In practice, RAID blocks are fixed size, and so IOs are split into RAID blocks as appropriate. The RAID striping and parity is usually generated by ASICs.

  • RAID0 is simply data striped over several disks. This gives a performance advantage, as it is possible to read parts of a file in parallel. However not only is there no data protection, it is actually less reliable than a single disk, as all the data is lost if a single disk in the array stripe fails.
    RAID0 principles
  • RAID1 is data mirroring. Two copies of the data are held on two physical disks, and the data is always identical. RAID1 has a performance advantage, as reads can come from either disk, and is simple to implement. However, it is expensive in large disk subsystems, as twice as many disks are needed to store the data.

    RAID1 principles

  • RAID2 is a theoretical entity. It stripes data at bit level across an array of disks, then writes check bytes to other disks in the array. The check bytes are calculated using a Hamming code. Theoretical performance is very high, but it would be so expensive to implement that no-one uses it.
  • RAID3Bytes of data are striped over an array of disks, then a parity byte is written to a dedicated parity disk. Successful implementations usually require that all the disks have synchronised rotation. RAID3 is not often used these days.
  • RAID4 data is striped in blocks onto the data disks, then parity is generated and written to a dedicated parity disk.
    RAID4 principles
    In the gif above, the right hand disk is dedicated parity, the other three disks are data disks.
  • RAID5 data is striped in blocks onto data disks, and parity is generated and rotated around the data disks. Good general performance, and reasonably cheap to implement. RAID5 was used extensively for general data.

    RAID5 principles

    If a block of data on a RAID5 disk is updated, then in the worst case, all the unchanged data blocks from the RAID stripe have to be read back from the disks, then new parity calculated before the new data block and new parity block can be written out. This means that a RAID5 write operation can require 2 data fetches. The performance impact is usually masked by a large subsystem cache.
    More efficient RAID-5 implementations hang on to the original data and use that to generate the parity according to the formula new-parity = old-data XOR new-data XOR old-parity. If the old data block is retained in cache, and it often is, then this just requires one extra fetch for the old parity.

  • RAID6 is growing in popularity as the double parity is seen as the best way to improve data resilience for very large disks. It was originally used in SUN V2X devices, where there are a lot of disks in a RAID array, and so a higher chance of multiple failures. RAID6 as implemented by SUN does not have a write overhead, as the data is always written out to a different block.

    The problem with RAID6 is that there is no standard method of implementation; every manufacturer has their own method. In fact there are two distinct architectures, RAID6 P+Q and RAID6 DP.

    DP, or Double Parity raid uses a mathematical method to generate two independent parity bits for each block of data, and several different mathematical methods are used.
    P+Q generates a horizontal P parity block, then combines those disks into a second RAID stripe and generates a Q parity, hence P+Q. The GIF below shows how RAID6 could be striped over 8 disks, Those 8 disks will only contain 6 disks worth of data.

    P+Q architectures tend to perform better than DP architectures and are more flexible in the number of disks that can be in each RAID array. DP architectures usually insist that the number of disks is prime, something like 4+1, 6+1 or 10+1. This can be a problem as the physical disks usually come in units of eight, and so do not easily fit a prime number scheme.

    RAID6 principles

  • RAID7 is a registered trademark of Storage Computer Corporation, and is basically RAID3 with an embedded operating system in the controller to manage the data and cache to speed up the access.
  • RAID1+0 is a combination of RAID1 mirroring and data striping. This means it has very good performance, and high reliability, so its ideal for mission critical database applications. All that redundancy means that it is expensive.
  • RAID50 is implemented as a RAID5 array that is then striped in RAID0 fashion for fast access
  • RAID53 applies this 'RAID then stripe' principle to RAID3. It should really be called RAID3+0. Both these RAID versions are expensive to implement in hardware terms
  • RAID0+1 is implemented as a mirrored array whose segments are RAID 0 arrays, which is not the same as RAID10. RAID 0+1 has the same fault tolerance as RAID level 5. The data will survive the loss of a single disk, but at this point, all you have is a striped RAID0 disk set. It does provide high performance, with lower resilience than RAID10.
  • RAID-S or parity RAID is a specific implementation of RAID5, used by EMC. It uses hardware facilities within the disks to produce the parity information, and so does not have the RAID5 write overhead. It used to be called RAID-S, and is sometimes called 3+1 or 7+1 RAID.
  • RAIDZ is part of the SUN ZFS file system. It is a software based variant of RAID5 which does not used a fixed size RAID stripe but writes out the current block of data as a varying size RAID stripe. With standard RAID, data is written and read in blocks and several blocks are usually combined together to make up a RAID stripe. If you need to update one data block, you have to read back all the other data blocks in that stripe to calculate the new RAID parity. RAIDZ eliminates the RAID 5 write penalty as any read and write of existing data will just include the current block. In a failure, data is re-created by reading checksum bytes from the file system itself, not the hardware, so recovery is independent of hardware failures. The problem, of course is that RAIDZ closely couples the operating system and the hardware. In other words, you have to buy them both from SUN.

back to top