RAID Technology The One
RAID Technology The One
The most part of text in this guide has been taken from copyrighted document of
Adaptec, Inc. on site (www.adaptec.com)
Perceptive Solutions, Inc.
RAID stands for Redundant Array of Inexpensive (or sometimes "Independent") Disks.
RAID is a method of combining several hard disk drives into one logical unit (two or more disks
grouped together to appear as a single device to the host system). RAID technology was
developed to address the fault-tolerance and performance limitations of conventional disk
storage. It can offer fault tolerance and higher throughput levels than a single hard drive or group
of independent hard drives. While arrays were once considered complex and relatively
specialized storage solutions, today they are easy to use and essential for a broad spectrum of
client/server applications.
There are many applications, particularly in a business environment, where there are needs
beyond what can be fulfilled by a single hard disk, regardless of its size, performance or quality
level. Many businesses can't afford to have their systems go down for even an hour in the event
of a disk failure; they need large storage subsystems with capacities in the terabytes; and they
want to be able to insulate themselves from hardware failures to any extent possible. Some
people working with multimedia files need fast data transfer exceeding what current drives can
deliver, without spending a fortune on specialty drives. These situations require that the
traditional "one hard disk per system" model be set aside and a new system employed. This
technique is called Redundant Arrays of Inexpensive Disks or RAID. ("Inexpensive" is sometimes
replaced with "Independent", but the former term is the one that was used when the term "RAID"
was first coined by the researchers at the University of California at Berkeley, who first
investigated the use of multiple-drive arrays in 1987.)
The fundamental principle behind RAID is the use of multiple hard disk drives in an array that
behaves in most respects like a single large, fast one. There are a number of ways that this can be
done, depending on the needs of the application, but in every case the use of multiple drives
allows the resulting storage subsystem to exceed the capacity, data security, and performance of
the drives that make up the system, to one extent or another. The tradeoffs--remember, there's no
free lunch--are usually in cost and complexity.
Originally, RAID was almost exclusively the province of high-end business applications, due to
the high cost of the hardware required. This has changed in recent years, and as "power users" of
all sorts clamor for improved performance and better up-time, RAID is making its way from the
"upper echelons" down to the mainstream. The recent proliferation of inexpensive RAID
controllers that work with consumer-grade IDE/ATA drives--as opposed to expensive SCSI
units--has increased interest in RAID dramatically. This trend will probably continue. I predict
that more and more motherboard manufacturers will begin offering support for the feature on
their boards, and within a couple of years PC builders will start to offer systems with inexpensive
RAID setups as standard configurations. This interest, combined with my long-time interest in
this technology, is the reason for my recent expansion of the RAID coverage on this site from
one page to 80. :^)
Unfortunately, RAID in the computer context doesn't really kill bugs dead. It can, if properly
implemented, "kill down-time dead", which is still pretty good. :^)
History
RAID technology was first defined by a group of computer scientists at the University of
California at Berkeley in 1987. The scientists studied the possibility of using two or more disks
to appear as a single device to the host system.
Although the array's performance was better than that of large, single-disk storage systems,
reliability was unacceptably low. To address this, the scientists proposed redundant architectures
to provide ways of achieving storage fault tolerance. In addition to defining RAID levels 1
through 5, the scientists also studied data striping -- a non-redundant array configuration that
distributes files across multiple disks in an array. Often known as RAID 0, this configuration
actually provides no data protection. However, it does offer maximum throughput for some data-
intensive applications such as desktop digital video production.
A number of factors are responsible for the growing adoption of arrays for critical network
storage.
More and more organizations have created enterprise-wide networks to improve productivity and
streamline information flow. While the distributed data stored on network servers provides
substantial cost benefits, these savings can be quickly offset if information is frequently lost or
becomes inaccessible. As today's applications create larger files, network storage needs have
increased proportionately. In addition, accelerating CPU speeds have outstripped data transfer
rates to storage media, creating bottlenecks in today's systems.
RAID storage solutions overcome these challenges by providing a combination of outstanding
data availability, extraordinary and highly scalable performance, high capacity, and recovery
with no loss of data or interruption of user access.
By integrating multiple drives into a single array -- which is viewed by the network operating
system as a single disk drive -- organizations can create cost-effective, minicomputersized
solutions of up to a terabyte or more of storage.
Principles
RAID combines two or more physical hard disks into a single logical unit using special hardware
or software. Hardware solutions are often designed to present themselves to the attached system
as a single hard drive, so that the operating system would be unaware of the technical workings.
For example, if one were to configure a hardware-based RAID-5 volume using three 250GB
hard drives (two drives for data, and one for parity), the operating system would be presented
with a single 500GB volume. Software solutions are typically implemented in the operating
system and would present the RAID volume as a single drive to applications running within the
operating system.
There are three key concepts in RAID: mirroring, the writing of identical data to more than one
disk; striping, the splitting of data across more than one disk; and error correction, where
redundant ("parity") data is stored to allow problems to be detected and possibly fixed (known as
fault tolerance). Different RAID schemes use one or more of these techniques, depending on the
system requirements. The purpose of using RAID is to improve reliability and availability of
data, ensuring that important data is not harmed in case of hardware failure, and/or to increase
the speed of file input/output.
Mirroring
Mirroring is one of the two data redundancy techniques used in RAID (the other being parity). In
a RAID system using mirroring, all data in the system is written simultaneously to two hard disks
instead of one; thus the "mirror" concept. The principle behind mirroring is that this 100% data
redundancy provides full protection against the failure of either of the disks containing the
duplicated data. Mirroring setups always require an even number of drives for obvious reasons.
The chief advantage of mirroring is that it provides not only complete redundancy of data, but
also reasonably fast recovery from a disk failure. Since all the data is on the second drive, it is
ready to use if the first one fails. Mirroring also improves some forms of
read performance (though it actually hurts write performance.) The chief disadvantage of RAID
1 is expense: that data duplication means half the space in the RAID is "wasted" so you must buy
twice the capacity that you want to end up with in the array. Performance is also not as good as
some RAID levels.
Block diagram of a RAID mirroring configuration. The RAID controller
duplicates the same information onto each of two hard disks. Note that
the RAID controller is represented as a "logical black box" since its functions
can be implemented in software, or several different types of hardware
(integrated controller, bus-based add-in card, stand-alone RAID hardware.)
Duplexing
Duplexing is an extension of mirroring that is based on the same principle as that technique. Like
in mirroring, all data is duplicated onto two distinct physical hard drives. Duplexing goes one
step beyond mirroring, however, in that it also duplicates the hardware that controls the two hard
drives (or sets of hard drives). So if you were doing mirroring on two hard disks, they would
both be connected to a single host adapter or RAID controller. If you were doing duplexing, one
of the drives would be connected to one adapter and the other to a second adapter.
Block diagram of a RAID duplexing configuration. Two controllers are used
to send the same information to two different hard disks. The controllers are
often regular host adapters or disk controllers with the mirroring done by the
system. Contrast this diagram with the one for straight mirroring.
Since hardware RAID is typically set up under the assumption that the RAID controller will
handle all the drives in the array, duplexing is not supported as an option in most PC hardware
RAID solutions--even fairly high-end ones. Duplexing is more often found in software RAID
solutions managed by the operating system, where the operating system is running the RAID
implementation at a high level and can easily split the data between the host adapters. (There are
hardware RAID duplexing solutions but usually only on very expensive external RAID boxes.)
Striping
The main performance-limiting issues with disk storage relate to the slow mechanical
components that are used forpositioning and transferring data. Since a RAID array has many
drives in it, an opportunity presents itself to improve performance by using the hardware in all
these drives in parallel. For example, if we need to read a large file, instead of pulling it all from
a single hard disk, it is much faster to chop it up into pieces, store some of the pieces on each of
the drives in an array, and then use all the disks to read back the file when needed. This
technique is called striping, after the pattern that might be visible if you could see these
"chopped up pieces" on the various drives with a different color used for each file. It is similar in
concept to the memory performance-enhancing technique called interleaving.
Striping can be done at the byte level, or in blocks. Byte-level striping means that the file is
broken into "byte-sized pieces" (hee hee, sorry about that, I just couldn't resist. ;^) ) The first
byte of the file is sent to the first drive, then the second to the second drive, and so on. (See the
discussion of RAID level 3 for more on byte-level striping.) Sometimes byte-level striping is
done as a sector of 512 bytes. Block-level striping means that each file is split into blocks of a
certain size and those are distributed to the various drives. The size of the blocks used is also
called the stripe size (or block size, or several other names), and can be selected from a variety of
choices when the array is set up; see here for more details.
Some companies use the term "spanning" when they really mean striping. Spanning really
normally refers toJBOD.
Parity
Mirroring is a data redundancy technique used by some RAID levels, in particular RAID level 1,
to provide data protection on a RAID array. While mirroring has some advantages and is well-
suited for certain RAID implementations, it also has some limitations. It has a high overhead
cost, because fully 50% of the drives in the array are reserved for duplicate data; and it doesn't
improve performance as much as data striping does for many applications. For this reason, a
different way of protecting data is provided as an alternate to mirroring. It involves the use
of parity information, which is redundancy information calculated from the actual data values.
You may have heard the term "parity" before, used in the context of system memory error
detection; in fact, the parity used in RAID is very similar in concept to parity RAM. The
principle behind parity is simple: take "N" pieces of data, and from them, compute an extra piece
of data. Take the "N+1" pieces of data and store them on "N+1" drives. If you lose any one of the
"N+1" pieces of data, you can recreate it from the "N" that remain, regardless of which piece is
lost. Parity protection is used with striping, and the "N" pieces of data are typically the blocks or
bytes distributed across the drives in the array. The parity information can either be stored on a
separate, dedicated drive, or be mixed with the data across all the drives in the array.
Input Output
#1 #2 "OR" "XOR"
0 0 0 0
0 1 1 1
1 0 1 1
1 1 1 0
Uh huh. So what, right? Well, the interesting thing about "XOR" is that it is a logical operation
that if performed twice in a row, "undoes itself". If you calculate "A XOR B" and then take that
result and do another "XOR B" on it, you get back A, the value you started with. That is to say,
"A XOR B XOR B = A". This property is exploited for parity calculation under RAID. If we
have four data elements, D1, D2, D3 and D4, we can calculate the parity data, "DP" as "D1 XOR
D2 XOR D3 XOR D4". Then, if we know any four of D1, D2, D3, D4 and DP, we can XOR
those four together and it will yield the missing element.
Let's take an example to show how this works; you can do this yourself easily on a sheet of
paper. Suppose we have the following four bytes of data: D1=10100101, D2=11110000,
D3=00111100, and D4=10111001. We can "XOR" them together as follows, one step at a time:
So "11010000" becomes the parity byte, DP. Now let's say we store these five values on five
hard disks, and hard disk #3, containing value "00111100", goes el-muncho. We can retrieve the
missing byte simply by XOR'ing together the other three original data pieces, and the parity byte
we calculated earlier, as so:
Which is D3, the missing value. Pretty neat, huh? :^) This operation can be done on any number
of bits, incidentally; I just used eight bits for simplicity. It's also a very simple binary
calculation--which is a good thing, because it has to be done for every bit stored in a parity-
enabled RAID array.
Compared to mirroring, parity (used with striping) has some advantages and disadvantages. The
most obvious advantage is that parity protects data against any single drive in the array failing
without requiring the 50% "waste" of mirroring; only one of the "N+1" drives contains
redundancy information. (The overhead of parity is equal to (100/N)% where N is the total
number of drives in the array.) Striping with parity also allows you to take advantage of
the performance advantages of striping. The chief disadvantages of striping with parity relate to
complexity: all those parity bytes have to be computed--millions of them per second!--and that
takes computing power. This means a hardware controller that performs these calculations is
required for high performance--if you do software RAID with striping and parity the
system CPU will be dragged down doing all these computations. Also, while you can recover
from a lost drive under parity, the missing data all has to be rebuilt, which has its own
complications; recovering from a lost mirrored drive is comparatively simple.
All of the RAID levels from RAID 3 to RAID 7 use parity; the most popular of these today
is RAID 5. RAID 2 uses a concept similar to parity but not exactly the same.
RAID Levels
There are several different RAID "levels" or redundancy schemes, each with inherent cost,
performance, and availability (fault-tolerance) characteristics designed to meet different storage
needs. No individual RAID level is inherently superior to any other. Each of the five array
architectures is well-suited for certain types of applications and computing environments. For
client/server applications, storage systems based on RAID levels 1, 0/1, and 5 have been the
most widely used. This is because popular NOSs such as Windows NT® Server and NetWare
manage data in ways similar to how these RAID architectures perform.
RAID 0 - RAID 1 - RAID 2 - RAID 3 - RAID 4 - RAID 5 - RAID 0/1 (or RAID 10)
RAID Level 0
Common Name(s): RAID 0. (Note that the term "RAID 0" is sometimes used to mean not only
the conventional striping technique described here but also other "non-redundant" ways of
setting up disk arrays. Sometimes it is (probably incorrectly) used just to describe a collection of
disks that doesn't use redundancy.)
Description: The simplest RAID level, RAID 0 should really be called "AID", since it involves
no redundancy. Files are broken into stripes of a size dictated by the user-defined stripe size of
the array, and stripes are sent to each disk in the array. Giving up redundancy allows this RAID
level the best overall performance characteristics of the single RAID levels, especially for its
cost. For this reason, it is becoming increasingly popular by performance-seekers, especially in
the lower end of the marketplace.
This illustration shows how files of different sizes are distributed between
the
drives on a four-disk, 16 kiB stripe size RAID 0 array. The red file is 4 kiB
in
size; the blue is 20 kiB; the green is 100 kiB; and the magenta is 500 kiB.
They are shown drawn to scale to illustrate how much space they take
up in relative terms in the array--one vertical pixel represents 1 kiB.
(To see the impact that increasing or decreasing stripe size has on the
way the data is stored in the array, see the 4 kiB and 64 kiB stripe size
versions of this illustration on the page discussing stripe size issues.)
Controller Requirements: Supported by all hardware controllers, both SCSI and IDE/ATA, and
also most software RAID solutions.
Hard Disk Requirements: Minimum of two hard disks (some may support one drive, the point
of which escapes me); maximum set by controller. Any type may be used, but they should be of
identical type and size for best performance and to eliminate "waste".
Fault Tolerance: None. Failure of any drive results in loss of all data, short of specialized data
recovery.
Availability: Lowest of any RAID level. Lack of fault tolerance means no rapid recovery from
failures. Failure of any drive results in array being lost and immediate downtime until array can
be rebuilt and data restored from backup.
Random Read Performance: Very good; better if using larger stripe sizes if the controller
supports independent reads to different disks in the array.
Random Write Performance: Very good; again, best if using a larger stripe size and a
controller supporting independent writes.
Special Considerations: Using a RAID 0 array without backing up any changes made to its data
at least daily is a loud statement that that data is not important to you.
Recommended Uses: Non-critical data (or data that changes infrequently and is backed up
regularly) requiring high speed, particularly write speed, and low cost of implementation. Audio
and video streaming and editing; web servers; graphic design; high-end gaming or hobbyist
systems; temporary or "scratch" disks on larger machines.
RAID Level 1
Description: RAID 1 is usually implemented as mirroring; a drive has its data duplicated on two
different drives using either a hardware RAID controller or software (generally via
the operating system). If either drive fails, the other continues to function as a single drive until
the failed drive is replaced. Conceptually simple, RAID 1 is popular for those who require fault
tolerance and don't need top-notch read performance. A variant of RAID 1 is duplexing, which
duplicates the controller card as well as the drive, providing tolerance against failures of either a
drive or a controller. It is much less commonly seen than straight mirroring.
Illustration of a pair of mirrored hard disks, showing how the
files are duplicated on both drives. (The files are the same as
those in the RAID 0 illustration, except that to save space I
have
reduced the scale here so one vertical pixel represents 2 kiB.)
Controller Requirements: Supported by all hardware controllers, both SCSI and IDE/ATA, and
also most software RAID solutions.
Hard Disk Requirements: Exactly two hard disks. Any type may be used but they should
ideally be identical.
Storage Efficiency: 50% if drives of the same size are used, otherwise (Size of Smaller Drive /
(Size of Smaller Drive + Size of Larger Drive) )
Availability: Very good. Most RAID controllers, even low-end ones, will support hot sparing
and automatic rebuilding of RAID 1 arrays.
Random Read Performance: Good. Better than a single drive but worse than many other RAID
levels.
Random Write Performance: Good. Worse than a single drive, but better than many other
RAID levels. :^)
Sequential Write Performance: Good; again, better than many other RAID levels.
Cost: Relatively high due to redundant drives; lowest storage efficiency of the single RAID
levels. Duplexing is still more expensive due to redundant controllers. On the other hand, no
expensive controller is required, and large consumer-grade drives are rather inexpensive these
days, making RAID 1 a viable choice for an individual system.
Special Considerations: RAID 1 arrays are limited to the size of the drives used in the array.
Multiple RAID 1 arrays can be set up if additional storage is required, but RAID 1+0 begins to
look more attractive in that circumstance. Performance may be reduced if implemented using
software instead of a hardware controller; duplexing may require software RAID and thus may
show lower performance than mirroring.
Recommended Uses: Applications requiring high fault tolerance at a low cost, without heavy
emphasis on large amounts of storage capacity or top performance. Especially useful in
situations where the perception is that having a duplicated set of data is more secure than using
parity. For this reason, RAID 1 is popular for accounting and other financial data. It is also
commonly used for small database systems, enterprise servers, and for individual users requiring
fault tolerance with a minimum of hassle and cost (since redundancy using parity generally
requires more expensive hardware.)
RAID Level 2
Common Name(s): RAID 2.
Description: Level 2 is the "black sheep" of the RAID family, because it is the only RAID level
that does not use one or more of the "standard" techniques of mirroring, striping and/or parity.
RAID 2 uses something similar to striping with parity, but not the same as what is used by RAID
levels 3 to 7. It is implemented by splitting data at the bit level and spreading it over a number of
data disks and a number of redundancy disks. The redundant bits are calculated using Hamming
codes, a form of error correcting code (ECC). Each time something is to be written to the array
these codes are calculated and written along side the data to dedicated ECC disks; when the data
is read back these ECC codes are read as well to confirm that no errors have occurred since the
data was written. If a single-bit error occurs, it can be corrected "on the fly". If this sounds
similar to the way that ECC is used within hard disks today, that's for a good reason: it's pretty
much exactly the same. It's also the same concept used for ECC protection of system memory.
Level 2 is the only RAID level of the ones defined by the original Berkeley document that is not
used today, for a variety of reasons. It is expensive and often requires manydrives--see below for
some surprisingly large numbers. The controller required was complex, specialized and
expensive. The performance of RAID 2 is also rather substandard in transactional environments
due to the bit-level striping. But most of all, level 2 was obviated by the use of ECC within a
hard disk; essentially, much of what RAID 2 provides you now get for "free" within each hard
disk, with other RAID levels providing protection above and beyond ECC.
Due to its cost and complexity, level 2 never really "caught on". Therefore, much of the
information below is based upon theoretical analysis, not empirical evidence.
Storage Efficiency: Depends on the number of data and ECC disks; for the 10+4 configuration,
about 71%; for the 32+7 setup, about 82%.
Fault Tolerance: Only fair; for all the redundant drives included, you don't get much tolerance:
only one drive can fail in this setup and be recoverable "on the fly".
Degradation and Rebuilding: In theory, there would be little degradation due to failure of a
single drive.
Random Write Performance: Poor, due to bit-level striping and ECC calculation overhead.
Cost: Very expensive.
RAID Level 3
Common Name(s): RAID 3. (Watch out for some companies that say their products implement
RAID 3 when they are really RAID 4.)
Description: Under RAID 3, data is striped across multiple disks at a byte level; the exact
number of bytes sent in each stripe varies but is typically under 1024. The parity information is
sent to a dedicated parity disk, but the failure of any disk in the array can be tolerated (i.e., the
dedicated parity disk doesn't represent a single point of failure in the array.) The dedicated parity
disk does generally serve as a performance bottleneck, especially for random writes, because it
must be accessed any time anything is sent to the array; this is contrasted to distributed-parity
levels such as RAID 5 which improve write performance by using distributed parity (though they
still suffer from large overheads on writes, as described here). RAID 3 differs from RAID 4 only
in the size of the stripes sent to the various disks.
This illustration shows how files of different sizes are distributed
between the drives on a four-disk, byte-striped RAID 3 array. As
with
the RAID 0 illustration, the red file is 4 kiB in size; the blue is 20
kiB;
the green is 100 kiB; and the magenta is 500 kiB, with each vertical
pixel representing 1 kiB of space. Notice that the files are evenly
spread between three drives, with the fourth containing parity
information (shown in dark gray). Since the blocks are so tiny in
RAID 3, the individual boundaries between stripes can't be seen.
You may want to compare this illustration to the one for RAID 4.
Hard Disk Requirements: Minimum of three standard hard disks; maximum set by controller.
Should be of identical size and type.
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 1) / Number of
Drives).
Availability: Very good. Hot sparing and automatic rebuild are usually supported by controllers
that implement RAID 3.
Random Write Performance: Poor, due to byte-level striping, parity calculation overhead, and
the bottleneck of the dedicated parity drive.
Recommended Uses: Applications working with large files that require high transfer
performance with redundancy, especially serving or editing large files: multimedia, publishing
and so on. RAID 3 is often used for the same sorts of applications that would typically see the
use of RAID 0, where the lack of fault tolerance of RAID 0 makes it unacceptable.
RAID Level 4
Description: RAID 4 improves performance by striping data across many disks in blocks, and
provides fault tolerance through a dedicated parity disk. This makes it in some ways the "middle
sibling" in a family of close relatives, RAID levels 3, 4 and 5. It is like RAID 3 except that it
uses blocks instead of bytes for striping, and like RAID 5 except that it uses dedicated parity
instead of distributed parity. Going from byte to block striping improves random access
performance compared to RAID 3, but the dedicated parity disk remains a bottleneck, especially
for random write performance. Fault tolerance, format efficiency and many other attributes are
the same as for RAID 3 and RAID 5.
This illustration shows how files of different sizes are distributed between
the drives on a four-disk RAID 4 array using a 16 kiB stripe size. As with the
RAID 0 illustration, the red file is 4 kiB in size; the blue is 20 kiB; the green
is 100 kiB; and the magenta is 500 kiB, with each vertical pixel representing
1 kiB of space. Notice that as with RAID 3, the files are evenly spread
between
three drives, with the fourth containing parity information (shown in gray).
You may want to contrast this illustration to the one for RAID 3 (which is
very
similar except that the blocks are so tiny you can't see them) and the one
for RAID 5 (which distributes the parity blocks across all four drives.)
Hard Disk Requirements: Minimum of three standard hard disks; maximum set by controller.
Should be of identical size and type.
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 1) / Number of
Drives).
Availability: Very good. Hot sparing and automatic rebuild are usually supported..
Special Considerations: Performance will depend to some extent upon the stripe size chosen.
Recommended Uses: Jack of all trades and master of none, RAID 4 is not as commonly used as
RAID 3 and RAID 5, because it is in some ways a "compromise" between them that doesn't have
a target market as well defined as either of those two levels. It is sometimes used
by applications commonly seen using RAID 3 or RAID 5, running the gamut from databases and
enterprise planning systems to serving large multimedia files.
RAID Level 5
Common Name(s): RAID 5.
Description: One of the most popular RAID levels, RAID 5 stripes both data and parity
information across three or more drives. It is similar to RAID 4 except that it exchanges the
dedicated parity drive for a distributed parity algorithm, writing data and parity blocks across all
the drives in the array. This removes the "bottleneck" that the dedicated parity drive represents,
improving write performance slightly and allowing somewhat better parallelism in a multiple-
transaction environment, though the overhead necessary in dealing with the parity continues to
bog down writes. Fault tolerance is maintained by ensuring that the parity information for any
given block of data is placed on a drive separate from those used to store the data itself. The
performance of a RAID 5 array can be "adjusted" by trying different stripe sizes until one is
found that is well-matched to the application being used.
This illustration shows how files of different sizes are distributed
between the drives on a four-disk RAID 5 array using a 16 kiB stripe
size. As with the RAID 0 illustration, the red file is 4 kiB in size; the
blue
is 20 kiB; the green is 100 kiB; and the magenta is 500 kiB, with each
vertical pixel representing 1 kiB of space. Contrast this diagram to the
one for RAID 4, which is identical except that the data is only on three
drives and the parity (shown in gray) is exclusively on the fourth.drive.
Hard Disk Requirements: Minimum of three standard hard disks; maximum set by controller.
Should be of identical size and type.
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 1) / Number of
Drives).
Availability: Good to very good. Hot sparing and automatic rebuild are usually featured on
hardware RAID controllers supporting RAID 5 (software RAID 5 will require down-time).
Random Write Performance: Only fair, due to parity overhead; this is improved over RAID 3
and RAID 4 due to eliminating the dedicated parity drive, but the overhead is still substantial.
Sequential Read Performance: Good to very good; generally better for smaller stripe sizes.
Cost: Moderate, but often less than that of RAID 3 or RAID 4 due to its greater popularity, and
especially if software RAID is used.
RAID LEVEL 6
Common Name(s): RAID 6. Some companies use the term "RAID 6" to refer to proprietary
extensions of RAID 5; these are not discussed here.
Description: RAID 6 can be thought of as "RAID 5, but more". It stripes blocks of data and
parity across an array of drives like RAID 5, except that it calculates two sets of parity
information for each parcel of data. The goal of this duplication is solely to improve fault
tolerance; RAID 6 can handle the failure of any two drives in the array while other single RAID
levels can handle at most one fault. Performance-wise, RAID 6 is generally slightly worse than
RAID 5 in terms of writes due to the added overhead of more parity calculations, but may be
slightly faster in random reads due to spreading of data over one more disk. As with RAID levels
4 and 5, performance can be adjusted by experimenting with different stripe sizes.
This illustration shows how files of different sizes are distributed
between the drives on a four-disk RAID 6 array using a 16 kiB stripe
size. As with the RAID 0 illustration, the red file is 4 kiB in size; the
blue
is 20 kiB; the green is 100 kiB; and the magenta is 500 kiB, with each
vertical pixel representing 1 kiB of space. This diagram is the same as
the
RAID 5 one, except that you'll notice that there is now twice as much
gray parity information, and as a result, more space taken up on the
four drives to contain the same data than the other levels that use
striping.
Hard Disk Requirements: Minimum of four hard disks; maximum set by controller. Should be
of identical size and type.
Storage Efficiency: If all drives are the same size, ( (Number of Drives - 2) / Number of
Drives).
Fault Tolerance: Very good to excellent. Can tolerate the simultaneous loss of any two drives in
the array.
Availability: Excellent.
Degradation and Rebuilding: Due to the complexity of dual distributed parity, degradation can
be substantial after a failure and during rebuilding. Dual redundancy may allow rebuilding to be
delayed to avoid performance hit.
Random Read Performance: Very good to excellent; generally better for larger stripe sizes.
Sequential Read Performance: Good to very good; generally better for smaller stripe sizes.
Cost: High.
Recommended Uses: In theory, RAID 6 is ideally suited to the same sorts of applications as
RAID 5, but in situations where additional fault tolerance is required. In practice, RAID 6 has
never really caught on because few companies are willing to pay for the extra cost to insure
against a relatively rare event--it's unusual for two drives to fail simultaneously (unless
something happens that takes out the entire array, in which case RAID 6 won't help anyway). On
the lower end of the RAID 5 market, the rise of hot swapping and automatic rebuild features for
RAID 5 have made RAID 6 even less desirable, since with these advanced features a RAID 5
array can recover from a single drive failure in a matter of hours (where without them, RAID 5
would require downtime for rebuilding, giving RAID 6 a substantial advantage.) On the higher
end of the RAID 5 market, RAID 6 usually loses out to multiple RAID solutions such as RAID
10 that provide some degree of multiple-drive fault tolerance while offering improved
performance as well.
RAID Level 7
Common Name(s): RAID 7.
Description: Unlike the other RAID levels, RAID 7 isn't an open industry standard; it is really a
trademarked marketing term of Storage Computer Corporation, used to describe their proprietary
RAID design. (I debated giving it a page alongside the other RAID levels, but since it is used in
the market, it deserves to be explained; that said, information about it appears to be limited.)
RAID 7 is based on concepts used in RAID levels 3 and 4, but greatly enhanced to address some
of the limitations of those levels. Of particular note is the inclusion of a great deal
of cache arranged into multiple levels, and a specialized real-time processor for managing the
array asynchronously. This hardware support--especially the cache--allow the array to handle
many simultaneous operations, greatly improving performance of all sorts while maintaining
fault tolerance. In particular, RAID 7 offers much improved random read and write performance
over RAID 3 or RAID 4 because the dependence on the dedicated parity disk is greatly reduced
through the added hardware. The increased performance of RAID 7 of course comes at a cost.
This is an expensive solution, made and supported by only one company.
Degradation and Rebuilding: Better than many RAID levels due to hardware support for parity
calculation operations and multiple cache levels.
Random Read Performance: Very good to excellent. The extra cache can often supply the
results of the read without needing to access the array drives.
Random Write Performance: Very good; substantially better than other single RAID levels
doing striping with parity.
Cost: Very high.
Not all combinations of RAID levels exist (which is good, because I'd get really bored of
describing them all! :^) ) Typically, the most popular multiple RAID levels are those that
combine single RAID levels that complement each other with different strengths and
weaknesses. Making a multiple RAID array marrying RAID 4 to RAID 5 wouldn't be the best
idea, since they are so similar to begin with.
In this section I take a look at some of the more common multiple RAID levels. Note that some
of the multiple RAID levels discussed here are frequently used, but others are rarely
implemented. In particular, for completeness I describe both the "X+Y" and "Y+X"
configurations of each multiple level, when in some cases only one or the other is commonly
made into product. For example, I know that RAID 50 (5+0) is an option in
commercial RAID controllers, but I am not sure if anyone makes a RAID 05 solution. There may
also be other combinations of RAID levels that I am not aware of.
Before looking at the specific multiple RAID levels, I have to explain a few things about the way
multiple RAID levels are constructed. A multiple RAID level is generally created by taking a
number of disks and dividing them into sets. Within each set a single RAID level is applied to
form a number of arrays. Then, the second RAID level is applied to the arrays to create a higher-
level array. This is why these are sometimes called nested arrays.
Since there are two levels, there are two ways they can be combined. The choice of which level
is applied first and which second has an impact on some important array characteristics. Let's
take as an example multiple RAID employing RAID 0 and RAID 1 to create an array of ten
disks. Much as we can define 10 to be 2*5 or 5*2, we can create our multiple RAID array two
ways:
RAID 0, then RAID 1: Divide the ten disks into two sets of five. Turn each set into a
RAID 0 array containing five disks, then mirror the two arrays. (Sometimes called a
"mirror of stripes".)
RAID 1, then RAID 0: Divide the ten disks into five sets of two. Turn each set into a
RAID 1 array, then stripe across the five mirrored sets. (A "stripe of mirrors").
Naming conventions for multiple RAID levels are just horrible. The standard that most of the
industry seems to use is that if RAID level X is applied first and then RAID level Y is applied
over top of it, that is RAID "X+Y", also sometimes seen as "RAID XY" or "RAID X/Y". This
would mean that alternative number 1 above would be called "RAID 0+1" or "RAID 1+0", and
that's in fact the terminology that most companies use. Unfortunately, other companies reverse
the terms! They might call the RAID 0 and then RAID 1 technique "RAID 1/0" or "RAID 10"
(perhaps out of fear that people would think "RAID 01" and "RAID 1" were the same thing).
Some designers use the terms "RAID 01" and "RAID 10" interchangeably. The result of all this
confusion is that you must investigate to determine what exactly a company is implementing
when you look at multiple RAID. Don't trust the label.
Of course, I haven't even explained why you should care about the distinctions, so I suppose I
should do that. :^) After all, if you have ten marbles, why would you care if they are arranged in
five columns and two rows, or two columns and five rows? Same here: aren't ten disks in an
array ten disks in an array? Clearly I am setting you up, so the answer is obviously "no". :^)
In many respects, there is no difference between them: there is no impact on drive requirements,
capacity, storage efficiency, and importantly, not much impact on performance. The big
difference comes into play when we look at fault tolerance. Most controllers implement multiple
level RAID by forming a "super array" comprised of "sub-arrays" underneath it. In many cases
the arrays that comprise the "super array"--often called sets--are considered to be logical "single
units", which means that the controller only considers one of these "single units" to either be
"up" or "down" as a whole. It will make use of redundancy features within a sub-array, but
not between sub-arrays, even if the higher-level array means that drives in different sub-arrays
will have the same data.
That makes this sound much more complicated than it really is; it's much easier to explain with
an example. Let's look at 10 drives and RAID 0+1 vs. RAID 1+0 again:
RAID 0+1: We stripe together drives 1, 2, 3, 4 and 5 into RAID 0 stripe set "A", and
drives 6, 7, 8, 9 and 10 into RAID 0 stripe set "B". We then mirror A and B using RAID
1. If one drive fails, say drive #2, then the entire stripe set "A" is lost, because RAID 0
has no redundancy; the RAID 0+1 array continues to chug along because the entire stripe
set "B" is still functioning. However, at this point you are reduced to running what is in
essence a straight RAID 0 array until drive #2 can be fixed. If in the meantime drive #9
goes down, you lose the entire array.
RAID 1+0: We mirror drives 1 and 2 to form RAID 1 mirror set "A"; 3 and 4 become
"B"; 5 and 6 become "C"; 7 and 8 become "D"; and 9 and 10 become "E". We then do a
RAID 0 stripe across sets A through E. If drive #2 fails now, only mirror set "A" is
affected; it still has drive #1 so it is fine, and the RAID 1+0 array continues functioning.
If while drive #2 is being replaced drive #9 fails, the array is fine, because drive #9 is in a
different mirror pair from #2. Only two failures in the same mirror set will cause the
array to fail, so in theory, five drives can fail--as long as they are all in different sets--and
the array would still be fine.
Clearly, RAID 1+0 is more robust than RAID 0+1. Now, if the controller running RAID 0+1
were smart, when drive #2 failed it would continue striping to the other four drives in stripe set
"A", and if drive #9 later failed it would "realize" that it could use drive #4 in its stead, since it
should have the same data. This functionality would theoretically make RAID 0+1 just as fault-
tolerant as RAID 1+0. Unfortunately, most controllers aren't that smart. It pays to ask specific
questions about how a multiple RAID array implementation handles multiple drive failures, but
in general, a controller won't swap drives between component sub-arrays unless the manufacturer
of the controller specifically says it will.
The same impact on fault tolerance applies to rebuilding. Consider again the example above. In
RAID 0+1, if drive #2 fails, the data on five hard disks will need to be rebuilt, because the whole
stripe set "A" will be wiped out. In RAID 1+0, only drive #2 has to be rebuilt. Again here, the
advantage is to RAID 1+0.
Tip: For a diagram showing graphically the difference between RAID 0+1 and RAID 1+0,
see the page discussing those levels.
Some controllers offer the choice of "what order" you use to set up a multiple RAID array; they
will let you do either RAID X+Y or RAID Y+X, as you choose. Others will force you to use
only one or the other configuration. Again, ask for details when considering a solution, and be
specific with your questions so you can figure out what exactly the controller you are
investigating does.
Common Name(s): RAID 0+1, 01, 0/1, "mirrored stripes", "mirror of stripes"; RAID 1+0, 10,
1/0, "striped mirrors", "stripe of mirrors". Labels are often used incorrectly; verify the details of
the implementation if the distinction between 0+1 and 1+0 is important to you.
Description: The most popular of the multiple RAID levels, RAID 01 and 10 combine the best
features of striping and mirroring to yield large arrays with high performance in most uses and
superior fault tolerance. RAID 01 is a mirrored configuration of two striped sets; RAID 10 is a
stripe across a number of mirrored sets. RAID 10 and 01 have been increasing dramatically in
popularity as hard disks become cheaper and the four-drive minimum is legitimately seen as
much less of an obstacle. RAID 10 provides better fault tolerance and rebuild performance than
RAID 01. Both array types provide very good to excellent overall performance by combining the
speed of RAID 0 with the redundancy of RAID 1 without requiring parity calculations.
This illustration shows how files of different sizes are distributed between the
drives on an eight-disk RAID 0+1 array using a 16 kiB stripe size for the RAID
0
portion. As with the RAID 0 illustration, the red file is 4 kiB in size; the blue is
20 kiB; the green is 100 kiB; and the magenta is 500 kiB, with each vertical
pixel
representing 1 kiB of space. The large, patterned rectangles represent the two
RAID 0 "sub arrays", which are mirrored using RAID 1 to create RAID 0+1.
The contents of the striped sets are thus identical. The diagram for RAID 1+0
would be the same except for the groupings: instead of two large boxes dividing
the drives horizontally, there would be four large boxes dividing the drives
vertically into mirrored pairs. These pairs would then be striped together to
form level 1+0. Contrast this diagram to the ones for RAID 0 and RAID 1.
Fault Tolerance: Very good for RAID 01; excellent for RAID 10.
Degradation and Rebuilding: Relatively little for RAID 10; can be more substantial for RAID
01.
Cost: Relatively high due to large number of drives required and low storage efficiency (50%).
Common Name(s): The most confusing naming of any of the RAID levels. :^) In an ideal
world, this level would be named RAID 0+3 (or 03) or RAID 3+0 (30). Instead, the number 53 is
often used in place of 03 for reasons I have never been able to determine, and worse, 53 is often
actually implemented as 30, not 03. As always, verify the details of the implementation to be
sure of what you have.
Technique(s) Used: Byte striping with dedicated parity combined with block striping.
Description: RAID 03 and 30 (though often called 53 for a reason that utterly escapes me)
combine byte striping, parity and block striping to create large arrays that are conceptually
difficult to understand. :^) RAID 03 is formed by putting into a RAID 3 array a number of
striped RAID 0 arrays; RAID 30 is more common and is formed by striping across a number of
RAID 3 sub-arrays. The combination of parity, small-block striping and large-block striping
makes analyzing the theoretical performance of this level difficult. In general, it provides
performance better than RAID 3 due to the addition of RAID 0 striping, but closer to RAID 3
than RAID 0 in overall speed, especially on writes. RAID 30 provides better fault tolerance and
rebuild performance than RAID 03, but both depend on the "width" of the RAID 3 dimension of
the drive relative to the RAID 0 dimension: the more parity drives, the lower capacity and
storage efficiency, but the greater the fault tolerance. See the examples below for more
explanation of this.
Most of the characteristics of RAID 0+3 and 3+0 are similar to those of RAID 0+5 and 5+0.
RAID 30 and 03 tend to be better for large files than RAID 50 and 05.
Hard Disk Requirements: Number of drives must be able to be factored into two integers, one
of which must be 2 or higher and the other 3 or higher (you can make a RAID 30 array from 10
drives but not 11). Minimum number of drives is six, with the maximum set by the controller.
Array Capacity: For RAID 03: (Size of Smallest Drive) * (Number of Drives In Each RAID 0
Set) * (Number of RAID 0 Sets - 1). For RAID 30: (Size of Smallest Drive) * (Number of Drives
In Each RAID 3 Set - 1) * (Number of RAID 3 Sets).
For example, the capacity of a RAID 03 array made of 15 18 GB drives arranged as three five-
drive RAID 0 sets would be 18 GB * 5 * (3-1) = 180 GB. The capacity of a RAID 30 array made
of 21 18 GB drives arranged as three seven-drive RAID 3 sets would be 18 GB * (7-1) * 3 = 324
GB. The same 21 drives arranged as seven three-drive RAID 3 sets would have a capacity of 18
GB * (3-1) * 7 = "only" 252 GB.
Storage Efficiency: For RAID 03: ( (Number of RAID 0 Sets - 1) / Number of RAID 0 Sets).
For RAID 30: ( (Number of Drives In Each RAID 3 Set - 1) / Number of Drives In Each RAID 3
Set).
Taking the same examples as above, the 15-drive RAID 03 array would have a storage efficiency
of (3-1)/3 = 67%. The first RAID 30 array, configured as three seven-drive RAID 3 sets, would
have a storage efficiency of (7-1)/7 = 86%, while the other RAID 30 array would have a storage
efficiency of, again, (3-1)/3 = 67%.
Fault Tolerance: Good to very good, depending on whether it is RAID 03 or 30, and the
number of parity drives relative to the total number. RAID 30 will provide better fault tolerance
than RAID 03.
Consider the two different 21-drive RAID 30 arrays mentioned above: the first one (three seven-
drive RAID 3 sets) has higher capacity and storage efficiency, but can only tolerate three
maximum potential drive failures; the one with lower capacity and storage efficiency (seven
three-drive RAID 3 sets) can handle as many as seven , if they are in different RAID 3 sets. Of
course few applications really require tolerance for seven independent drive failures! And of
course, if those 21 drives were in a RAID 03 array instead, failure of a second drive after one had
failed and taken down one of the RAID 0 sub-arrays would crash the entire array.
Degradation and Rebuilding: Relatively little for RAID 30 (though more than RAID 10); can
be more substantial for RAID 03.
Random Read Performance: Very good, assuming RAID 0 stripe size is reasonably large.
Cost: Relatively high due to requirements for a hardware controller and a large number of
drives; storage efficiency is better than RAID 10 however and no worse than any other RAID
levels that include redundancy.
Recommended Uses: Not as widely used as many other RAID levels. Applications include data
that requires the speed of RAID 0 with fault tolerance and high capacity, such as critical
multimedia data and large database or file servers. Sometimes used instead of RAID 3 to
increase capacity as well as performance.
Common Name(s): RAID 0+5 or 05; RAID 5+0 or 50. As with the other multiple RAID levels,
verify the exact implementation instead of relying on the label.
Technique(s) Used: Block striping with distributed parity combined with block striping.
Description: RAID 05 and 50 form large arrays by combining the block striping and parity of
RAID 5 with the straight block striping of RAID 0. RAID 05 is a RAID 5 array comprised of a
number of striped RAID 0 arrays; it is less commonly seen than RAID 50, which is a RAID 0
array striped across RAID 5 elements. RAID 50 and 05 improve upon the performance of RAID
5 through the addition of RAID 0, particularly during writes. It also provides better fault
tolerance than the single RAID level does, especially if configured as RAID 50.
Most of the characteristics of RAID 05 and 50 are similar to those of RAID 03 and 30. RAID 50
and 05 tend to be preferable for transactional environments with smaller files than 03 and 30.
This illustration shows how files of different sizes are distributed between the
drives
on an eight-disk RAID 5+0 array using a 16 kiB stripe size. As with the RAID 0
illustration, the red file is 4 kiB in size; the blue is 20 kiB; the green is 100 kiB;
and
the magenta is 500 kiB, with each vertical pixel representing 1 kiB of space.
Each of the large, patterned rectangles represents a four-drive RAID 5 array.
The data is evenly striped between these two RAID 5 arrays using RAID 0. Then
within each RAID 5 array, the data is stored using striping with parity. So the first
small file, and 12 kiB of the second file, were sent to the top RAID 5 array; the
remaining 8 kiB of the second file and the first 8 kiB of the 100 kiB file went to
the
bottom RAID 5 array; then the next 16 kiB of the 100 kiB went to the top array,
and so on. Within each RAID 5 array the data is striped and parity calculated just
like a regular RAID 5 array; each array just does this with half the number of
blocks
it normally would. Contrast this diagram to the ones for RAID 0 and RAID 5.
Hard Disk Requirements: Number of drives must be able to be factored into two integers, one
of which must be 2 or higher and the other 3 or higher (you can make a RAID 30 array from 6
drives but not 7). Minimum number of drives is six, with the maximum set by the controller.
Array Capacity: Same as RAID 03 and 30. For RAID 05: (Size of Smallest Drive) * (Number
of Drives In Each RAID 0 Set) * (Number of RAID 0 Sets - 1). For RAID 50: (Size of Smallest
Drive) * (Number of Drives In Each RAID 5 Set - 1) * (Number of RAID 5 Sets).
For example, the capacity of a RAID 05 array made of 15 18 GB drives arranged as three five-
drive RAID 0 sets would be 18 GB * 5 * (3-1) = 180 GB. The capacity of a RAID 50 array made
of 21 18 GB drives arranged as three seven-drive RAID 5 sets would be 18 GB * (7-1) * 3 = 324
GB. The same 21 drives arranged as seven three-drive RAID 5 sets would have a capacity of 18
GB * (3-1) * 7 = 252 GB.
Storage Efficiency: Same as for RAID 03 and 30. For RAID 05: ( (Number of RAID 0 Sets -
1) / Number of RAID 0 Sets). For RAID 50: ( (Number of Drives In Each RAID 5 Set - 1) /
Number of Drives In Each RAID 5 Set).
Taking the same examples as above, the 15-drive RAID 05 array would have a storage efficiency
of (3-1)/3 = 67%. The first RAID 50 array, configured as three seven-drive RAID 5 sets, would
have a storage efficiency of (7-1)/7 = 86%, while the other RAID 50 array would have a storage
efficiency of (3-1)/3 = 67%.
Fault Tolerance: Same as for RAID 03 and 30. Good to very good, depending on whether it is
RAID 05 or 50, and the number of parity drives relative to the total number. RAID 50 will
provide better fault tolerance than RAID 05.
Consider the two different 21-drive RAID 50 arrays mentioned above: the first one (three seven-
drive RAID 5 sets) has higher capacity and storage efficiency, but can only tolerate three
maximum potential drive failures; the one with lower capacity and storage efficiency (seven
three-drive RAID 5 sets) can handle as many as seven , if they are in different RAID 5 sets. Of
course few applications really require tolerance for seven independent drive failures! And of
course, if those 21 drives were in a RAID 05 array instead, failure of a second drive after one had
failed and taken down one of the RAID 0 sub-arrays would crash the entire array.
Degradation and Rebuilding: Moderate for RAID 50; worse for RAID 05.
Cost: Relatively high due to requirements for a hardware controller and a large number of
drives; storage efficiency is better than RAID 10 however and no worse than any other RAID
levels that include redundancy.
Special Considerations: Complex and expensive to implement.
Recommended Uses: Applications that require high fault tolerance, capacity and random
positioning performance. Not as widely used as many other RAID levels. Sometimes used
instead of RAID 5 to increase capacity. Sometimes used for large databases.
Common Name(s): RAID 1+5 or 15; RAID 5+1 or 51. "Common" is a bit of a stretch with this
level, as it is less common than probably any other, so it's important to verify the details of each
implementation.
Technique(s) Used: Mirroring (or duplexing) combined with block striping with distributed
parity.
Description: RAID 1+5 and 5+1 might be sarcastically called "the RAID levels for the truly
paranoid". :^) The only configurations that use both redundancy methods, mirroring and parity,
this "belt and suspenders" technique is designed to maximize fault tolerance and availability, at
the expense of just about everything else. A RAID 15 array is formed by creating a striped set
with parity using multiple mirrored pairs as components; it is similar in concept to RAID 10
except that the striping is done with parity. Similarly, RAID 51 is created by mirroring entire
RAID 5 arrays and is similar to RAID 01 except again that the sets are RAID 5 instead of RAID
0 and hence include parity protection. Performance for these arrays is good but not very high for
the cost involved, nor relative to that of other multiple RAID levels.
The fault tolerance of these RAID levels is truly amazing; an eight-drive RAID 15 array can
tolerate the failure of anythree drives simultaneously; an eight-drive RAID 51 array can also
handle three and even as many as five, as long as at least one of the mirrored RAID 5 sets has no
more than one failure! The price paid for this resiliency is complexity and cost of
implementation, and very low storage efficiency.
The RAID 1 component of this nested level may in fact use duplexing instead of mirroring to
add even more fault tolerance.
Hard Disk Requirements: An even number of hard disks with a minimum of six; maximum
dependent on controller. All drives should be identical.
Array Capacity: (Size of Smallest Drive) * ( (Number of Drives / 2) - 1). So an array with ten
18 GB drives would have a capacity of 18 GB * ( (10/2) - 1 ) = just 72 GB.
Storage Efficiency: Assuming all drives are the same size, ( (Number of Drives / 2) - 1 ) /
(Number of Drives). In the example above, efficiency is 40%. This is the worst storage
efficiency of any RAID level; a six-drive RAID 15 or 51 array would have a storage efficiency
of just 33%!
Availability: Excellent.
Cost: Very high. An uncommon solution requiring a lot of storage devices for relatively low
capacity, and possibly additional hardware or software.
If you have some disks in a system that you decide not to configure into a RAID array, what do
you do with them? Traditionally, they are left to act as independent drive volumes within the
system, and that's how many people in fact use two, three or more drives in a PC. In some
applications, however, it is desirable to be able to use all these disks as if they were one single
volume. The proper term for this is spanning; the pseudo-cutesy term for it, clearly chosen to
contrast against "redundant array of inexpensive disks", is Just A Bunch Of Disks or JBOD. How
frightfully clever.
JBOD isn't really RAID at all, but I discuss it here since it is sort of a "third cousin" of RAID...
JBOD can be thought of as the opposite of partitioning: while partitioning chops single drives up
into smaller logical volumes, JBOD combines drives into larger logical volumes. It provides no
fault tolerance, nor does it provide any improvements in performance compared to the
independent use of its constituent drives. (In fact, it arguably hurts performance, by making it
more difficult to use the underlying drives concurrently, or to optimize different drives for
different uses.)
When you look at it, JBOD doesn't really have a lot to recommend it. It still requires a controller
card or software driver, which means that almost any system that can do JBOD can also
do RAID 0, and RAID 0 has significant performance advantages over JBOD. Neither provide
fault tolerance, so that's a wash. There are only two possible advantages of JBOD over RAID 0:
Avoiding Drive Waste: If you have a number of odd-sized drives, JBOD will let you
combine them into a single unit without loss of any capacity; a 10 GB drive and 30 GB
would combine to make a 40 GB JBOD volume but only a 20 GB RAID 0 array. This
may be an issue for those expanding an existing system, though with drives so cheap
these days it's a relatively small advantage.
Easier Disaster Recovery: If a disk in a RAID 0 volume dies, the data on every disk in
the array is essentially destroyed because all the files are striped; if a drive in a JBOD set
dies then it may be easier to recover the files on the other drives (but then again, it might
not, depending on how the operating system manages the disks.) Considering that you
should be doing regular backups regardless, and that even under JBOD recovery can be
difficult, this too is a minor advantage.
Note: Some companies use the term "spanning" when they really mean striping, so watch
out for that!
Below you will find a table that summarizes the key quantitative attributes of the various RAID
levels for easy comparison. For the full details on any RAID level, see its own page, accessible
here. For a description of the different characteristics, see the discussion of factors differentiating
RAID levels. Also be sure to read the notes that follow the table:
For the number of disks, the first few valid sizes are shown; you can figure out the rest
from the examples given in most cases. Minimum size is the first number shown;
maximum size is normally dictated by the controller. RAID 01/10 and RAID 15/51 must
have an even number of drives, minimum 6. RAID 03/30 and 05/50 can only have sizes
that are a product of integers, minimum 6.
For capacity and storage efficiency, "S" is the size of the smallest drive in the array, and
"N" is the number of drives in the array. For the RAID 03 and 30, "N0" is the width of
the RAID 0 dimension of the array, and "N3" is the width of the RAID 3 dimension. So a
12-disk RAID 30 array made by creating three 4-disk RAID 3 arrays and then striping
them would have N3=4 and N0=3. The same applies for "N5" in the RAID 05/50 row.
Storage efficiency assumes all drives are of identical size. If this is not the case, the
universal computation (array capacity divided by the sum of all drive sizes) must be used.
Performance rankings are approximations and to some extent, reflect my personal
opinions. Please don't over-emphasize a "half-star" difference between two scores!
Cost is relative and approximate, of course. In the real world it will depend on many
factors; the dollar signs are just intended to provide some perspective.