hardware-whole: MORE ABOUT RAID

Implementations
The distribution of data across multiple drives can be managed either by dedicated hardware or by software. When done in software the software may be part of the operating system or it may be part of the firmware and drivers supplied with the card.
Operating system based ("software RAID")
Software implementations are now provided by many operating systems. A software layer sits above the (generally block-based) disk device drivers and provides an abstraction layer between the logical drives (RAIDs) and physical drives. Most common levels are RAID 0 (striping across multiple drives for increased space and performance) and RAID 1 (mirroring two drives), followed by RAID 1+0, RAID 0+1, and RAID 5 (data striping with parity) are supported.
Microsoft's server operating systems support 3 RAID levels; RAID 0, RAID 1, and RAID 5. Some of the Microsoft desktop operating systems support RAID such as Windows XP Professional which supports RAID level 0 in addition to spanning multiple disks but only if using dynamic disks and volumes.
Apple's Mac OS X Server supports RAID 0, RAID 1, and RAID 1+0.[6]
FreeBSD supports RAID 0, RAID 1, RAID 3, and RAID 5.
NetBSD supports RAID 0, RAID 1, RAID 4 and RAID 5 (and any nested combination of those like 1+0) via its software implementation, named raidframe.
OpenSolaris and Solaris 10 supports RAID 0, RAID 1, RAID 5, and RAID 6 (and any nested combination of those like 1+0) via ZFS with limited support on the system hard drive ( RAID 1 only ). Through SVM, Solaris 10 and earlier versions support RAID 0, RAID 1, and RAID 5 on both system and data drives
The software must run on a host server attached to storage, and server's processor must dedicate processing time to run the RAID software. This is negligible for RAID 0 and RAID 1, but may be significant for more complex parity-based schemes. Furthermore all the busses between the processor and the disk controller must carry the extra data required by RAID which may cause congestion.
Another concern with operating system-based RAID is the boot process, it can be difficult or impossible to set up the boot process such that it can failover to another drive if the usual boot drive fails and therefore such systems can require manual intervention to make the machine bootable again after a failure. Finally operating system-based RAID usually uses formats specific to the operating system in question so it cannot generally be used for partitions that are shared between operating systems as part of a multi-boot setup.
Most operating system-based implementations allow RAIDs to be created from partitions rather than entire physical drives. For instance, an administrator could divide an odd number of disks into two partitions per disk, mirror partitions across disks and stripe a volume across the mirrored partitions to emulate a RAID 1E configuration.[citation needed] Using partitions in this way also allows mixing reliability levels on the same set of disks. For example, one could have a very robust RAID-1 partition for important files, and a less robust RAID-5 or RAID-0 partition for less important data. (Some controllers offer similar features, e.g. Intel Matrix RAID.) Using two partitions on the same drive in the same RAID is, however, dangerous. If, for example, a RAID 5 array is composed of four drives 250 + 250 + 250 + 500 GB, with the 500-GB drive split into two 250 GB partitions, a failure of this drive will remove two partitions from the array, causing all of the data held on it to be lost.
Hardware-based
Hardware RAID controllers use different, proprietary disk layouts, so it is not usually possible to span controllers from different manufacturers. They do not require processor resources, the BIOS can boot from them, and tighter integration with the device driver may offer better error handling.
A hardware implementation of RAID requires at least a special-purpose RAID controller. On a desktop system this may be a PCI expansion card, PCI-e expansion card or built into the motherboard. Controllers supporting most types of drive may be used - IDE/ATA, SATA, SCSI, SSA, Fibre Channel, sometimes even a combination. The controller and disks may be in a stand-alone disk enclosure, rather than inside a computer. The enclosure may be directly attached to a computer, or connected via SAN. The controller hardware handles the management of the drives, and performs any parity calculations required by the chosen RAID level.
Most hardware implementations provide a read/write cache, which, depending on the I/O workload, will improve performance. In most systems the write cache is non-volatile (i.e. battery-protected), so pending writes are not lost on a power failure.
Hardware implementations provide guaranteed performance, add no overhead to the local CPU complex and can support many operating systems, as the controller simply presents a logical disk to the operating system.
Hardware implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.
Firmware/driver based RAID
Operating system-based RAID cannot easily be used to protect the boot process and is generally impractical on desktop versions of Windows (as described above). Hardware RAID controllers are expensive. To fill this gap, cheap "RAID controllers" were introduced that do not contain a RAID controller chip, but simply a standard disk controller chip with special firmware and drivers. During early stage bootup the RAID is implemented by the firmware; when a protected-mode operating system kernel such as Linux or a modern version of Microsoft Windows is loaded the drivers take over.
These controllers are described by their manufacturers as RAID controllers, and it is rarely made clear to purchasers that the burden of RAID processing is borne by the host computer's central processing unit, not the RAID controller itself, thus introducing the aforementioned CPU overhead. Before their introduction, a "RAID controller" implied that the controller did the processing, and the new type has become known in technically knowledgeable circles as "fake RAID" even though the RAID itself is implemented correctly.
Network-attached storage
While not directly associated with RAID, Network-attached storage (NAS) is an enclosure containing disk drives and the equipment necessary to make them available over a computer network, usually Ethernet. The enclosure is basically a dedicated computer in its own right, designed to operate over the network without screen or keyboard. It contains one or more disk drives; multiple drives may be configured as a RAID.
Hot spares
Both hardware and software RAIDs with redundancy may support the use of hot spare drives, a drive physically installed in the array which is inactive until an active drive fails, when the system automatically replaces the failed drive with the spare, rebuilding the array with the spare drive included. This reduces the mean time to repair (MTTR), though it doesn't eliminate it completely. A second drive failure in the same RAID redundancy group before the array is fully rebuilt will result in loss of the data; rebuilding can take several hours, especially on busy systems.
Rapid replacement of failed drives is important as the drives of an array will all have had the same amount of use, and may tend to fail at about the same time rather than randomly. RAID 6 without a spare uses the same number of drives as RAID 5 with a hot spare and protects data against simultaneous failure of up to two drives, but requires a more advanced RAID controller.
Reliability terms
Failure rate
The mean time to failure (MTTF) or the mean time between failure (MTBF) of a given RAID is the same as those of its constituent hard drives, regardless of what type of RAID is employed.
Mean time to data loss (MTTDL)
In this context, the average time before a loss of data in a given array.[7]. Mean time to data loss of a given RAID may be higher or lower than that of its constituent hard drives, depending upon what type of RAID is employed.
Mean time to recovery (MTTR)
In arrays that include redundancy for reliability, this is the time following a failure to restore an array to its normal failure-tolerant mode of operation. This includes time to replace a failed disk mechanism as well as time to re-build the array (i.e. to replicate data for redundancy).
Unrecoverable bit error rate (UBE)
This is the rate at which a disk drive will be unable to recover data after application of cyclic redundancy check (CRC) codes and multiple retries.
Write cache reliability
Some RAID systems use RAM write cache to increase performance. A power failure can result in data loss unless this sort of disk buffer is supplemented with a battery to ensure that the buffer has enough time to write from RAM back to disk.
Atomic write failure
Also known by various terms such as torn writes, torn pages, incomplete writes, interrupted writes, non-transactional, etc.
Problems with RAID
Correlated failures
The theory behind the error correction in RAID assumes that failures of drives are independent. Given these assumptions it is possible to calculate how often they can fail and to arrange the array to make data loss arbitrarily improbable.
In practice, the drives are often the same ages, with similar wear. Since many drive failures are due to mechanical issues which are more likely on older drives, this violates those assumptions and failures are in fact statistically correlated. In practice then, the chances of a second failure before the first has been recovered is not nearly as unlikely as might be supposed, and data loss can in practice occur at significant rates.[8]
Atomicity
This is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple" [9]during the early days of relational database commercialization. However, this warning largely went unheeded and fell by the wayside upon the advent of RAID, which many software engineers mistook as solving all data storage integrity and reliability problems. Many software programs update a storage object "in-place"; that is, they write a new version of the object on to the same disk addresses as the old version of the object. While the software may also log some delta information elsewhere, it expects the storage to present "atomic write semantics," meaning that the write of the data either occurred in its entirety or did not occur at all.
However, very few storage systems provide support for atomic writes, and even fewer specify their rate of failure in providing this semantic. Note that during the act of writing an object, a RAID storage device will usually be writing all redundant copies of the object in parallel, although overlapped or staggered writes are more common when a single RAID processor is responsible for multiple drives. Hence an error that occurs during the process of writing may leave the redundant copies in different states, and furthermore may leave the copies in neither the old nor the new state. The little known failure mode is that delta logging relies on the original data being either in the old or the new state so as to enable backing out the logical change, yet few storage systems provide an atomic write semantic on a RAID disk.
While the battery-backed write cache may partially solve the problem, it is applicable only to a power failure scenario.
Since transactional support is not universally present in hardware RAID, many operating systems include transactional support to protect against data loss during an interrupted write. Novell Netware, starting with version 3.x, included a transaction tracking system. Microsoft introduced transaction tracking via the journalling feature in NTFS. NetApp WAFL file system solves it by never updating the data in place, as does ZFS.
Unrecoverable data
This can present as a sector read failure. Some RAID implementations protect against this failure mode by remapping the bad sector, using the redundant data to retrieve a good copy of the data, and rewriting that good data to the newly mapped replacement sector. The UBE (Unrecoverable Bit Error) rate is typically specified at 1 bit in 1015 for enterprise class disk drives (SCSI, FC, SAS) , and 1 bit in 1014 for desktop class disk drives (IDE/ATA/PATA, SATA). Increasing disk capacities and large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a RAID group after a disk failure because an unrecoverable sector is found on the remaining drives. Double protection schemes such as RAID 6 are attempting to address this issue, but suffer from a very high write penalty.
Write cache reliability
The disk system can acknowledge the write operation as soon as the data is in the cache, not waiting for the data to be physically written. However, any power outage can then mean a significant data loss of any data queued in such cache.
Often a battery is protecting the write cache, mostly solving the problem. If a write fails because of power failure, the controller may complete the pending writes as soon as restarted. This solution still has potential failure cases: the battery may have worn out, the power may be off for too long, the disks could be moved to another controller, the controller itself could fail. Some disk systems provide the capability of testing the battery periodically, however this leaves the system without a fully charged battery for several hours.
An additional concern about write cache reliability exists, and that is that a lot of them are write-back cache; a caching system which reports the data as written as soon as it is written to cache, as opposed to the non-volatile medium [10]. The safer cache technique is write-through, which reports transactions as written when they are written to the non-volatile medium.
Equipment compatibility
The disk formats on different RAID controllers are not necessarily compatible, so that it may not be possible to read a RAID on different hardware. Consequently a non-disk hardware failure may require using identical hardware, or a data backup, to recover the data. Software RAID however, such as implemented in the Linux kernel, alleviates this concern, as the setup is not hardware dependent, but runs on ordinary disk controllers. Additionally, Software RAID1 disks can be read like normal disks, so no RAID system is required to retrieve the data.
History
Norman Ken Ouchi at IBM was awarded a 1978 U.S. patent 4,092,732[11] titled "System for recovering data stored in failed memory unit." The claims for this patent describe what would later be termed RAID 5 with full stripe writes. This 1978 patent also mentions that disk mirroring or duplexing (what would later be termed RAID 1) and protection with dedicated parity (that would later be termed RAID 4) were prior art at that time.
The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley in 1987. They studied the possibility of using two or more drives to appear as a single device to the host system and published a paper: "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference.[12]
This specification suggested a number of prototype RAID levels, or combinations of drives. Each had theoretical advantages and disadvantages. Over the years, different implementations of the RAID concept have appeared. Most differ substantially from the original idealized RAID levels, but the numbered names have remained. This can be confusing, since one implementation of RAID 5, for example, can differ substantially from another. RAID 3 and RAID 4 are often confused and even used interchangeably.

NOTE:

hardware-whole

2008年10月8日星期三

MORE ABOUT RAID

没有评论:

博客归档

我的简介