Picking apart RAID

Why use parity at all?
It may seem that parity can make things more complicated with a RAID array, so why not just stick with something like RAID 0 or RAID 1 and leave parity out of the equation? For starters, RAID 0 gives no fault tolerance, so it's not suitable for high-availability environments. RAID 1 doesn't use parity and is very inefficient with its use of disk space, because it requires a full 50 percent of the available storage since the data is simply mirrored. Using parity and RAID 3, 4, or 5, you can create a highly available disk array that can tolerate the loss of one of the disks. The data can be rebuilt using the parity information stored in the array, and these RAID levels make much more efficient use of the available disk space. What happens when parity goes bad?
With a single drive failure under any of RAID levels 1, 3, 4, 5, or 6, the failed drive can be replaced. The RAID array controller will automatically regenerate the data on the new drive using the parity information from the other drives and restore fault tolerance to the entire array. Although RAID provides an extra level of protection in the event of drive failure, parity errors can crop up. When you encounter a parity error, it indicates there is bad data on the drive. If the data cannot be corrected, you may need to load the data off to a backup tape. You know that the data cannot be corrected if you try to open a file or run an application that attempts to read that particular portion of the disk, but the file will not open, or the application crashes or doesn't run at all. In many instances, you will be notified via an error message that there was a problem reading from the disk. Often, the problem will become evident during the system backup, when all of the data on the disk is read in one sweep. In a RAID array, when a parity error is detected, the source data is reread to try to get it right. With or without RAID, parity errors can be generated because of a number of factors other than a failed disk. For example, parity errors may occur if the drive cables aren't properly connected or shielded, or if the wrong type of cable is being used to connect the disks to the controller. If you notice a significant number of parity errors, try swapping the cables and testing the controller card to make sure it hasn't gone bad. Also, check the SCSI terminators to see if one may have come loose. Most RAID controllers come with diagnostics programs to do some of the troubleshooting, so be sure to make good use of any of these packages, too. You should also investigate the physical connections to your SCSI devices to determine if they may be the source of the parity problems. First, make sure you're using the right SCSI cable. Ram Electronics has pictures of many common SCSI connectors and the SCSI Trade Association (STA)-endorsed terms and specifications for each type of connector. Most internal SCSI cables are of the ribbon variety, with any number of individual wires running through the ribbon. If even one of those wires is exposed, shorting out, cut, or not fully attached to the connector on the end, it may create data transfer problems. Make sure the SCSI cable is properly connected to both the controller card and the drive, and that the pins on the devices line up with the pins on the SCSI connector. Testing a controller card is a little more difficult. The easiest way is to use the diagnostic program that comes with many SCSI and RAID adapters. During system installation for certain servers, such as those from Dell and Compaq, utilities are written to a small partition on a disk array. Among these utilities are programs that test the array controller. You can run these programs at system boot time by pressing a key combination on the keyboard, which interrupts the boot process and instead runs the system utilities. Newer systems also include Windows-based array utilities that perform many of the same functions. Dell, for example, includes its Array Manager product for servers shipping with an array controller, which you can install with the rest of the system management suite. A second controller testing method involves moving the controller to another machine and testing it with different hardware. This isn't the preferable method, because it could result in more downtime and assumes that you have spare hardware lying about that you can use to test the hardware. How does the parity become corrupted?
A number of issues could cause the corruption of parity on a disk, including:
  • System crashes: When a system crashes, any data not written to the disk is lost. In the event that data was being written to a RAID array, it is possible that either the data or the parity was written to disk, but not both. In a situation such as this, you can't rely on the parity to reconstruct the data on the disk. Reducing the number of system crashes by making use of UPS units and redundant power supplies will help protect against this type of parity corruption.
  • Uncorrectable bit errors: A hard disk in an array is nothing more than a bunch of magnetic bits that gradually lose the ability to hold data over time. Eventually, bit errors are detected when an attempt is made to read data back from the drive. Many RAID arrays use embedded software that monitors the individual disks and informs an operator when a disk may be about to fail. When I am informed of an impending disk failure, I generally run a diagnostic on the RAID array to make sure the controller is working properly and verify that the error message was correct. If the verification comes back with a problem, I either replace the RAID card -- which rarely happens -- or replace any drives that the diagnostics identified as bad.
  • A disk failure: Like a system crash, a disk failure can have a negative impact on parity. Disks can fail for a variety of reasons: age, overuse, excessive powering up and down, or power surges. When a disk in an array fails, replace it immediately and run a diagnostic on the array. A single disk failure may indicate more failures to come.
  • Other possible causes: If the array checks out okay and the cables have been tested, the power supply in the system may be delivering too much power to a disk in the array, causing parity problems. You can test for such an issue with a voltmeter -- but be careful, because electrocution is always a possibility when working with a voltmeter. First, disconnect the system from the power source and insert the probes of the voltmeter into the socket. Next, verify the output against the local standard (110 to 120 volts in North America). Once you plug the system back in to the wall, you can disconnect the drive array from the power supply and use the voltmeter to test the individual power leads in the same way. Exact power specifications for the leads can be found in the system guide or on the manufacturer's Web site.
Additional reading
Luckily, most of today's RAID and SCSI controllers are very good about making sure parity errors are not introduced onto the disk. However, if this does happen, follow the suggestions above to minimise the risk of data corruption and failure. If you aren't using a parity-enabled RAID scheme on a mission-critical system, do a cost/benefit analysis and get RAID installed. It will be worth much more than the cost of a disk failure. An excellent discussion of RAID advantages and disadvantages can be found at Advanced Computer & Network Corporation's RAID.edu Web site.
Have your say instantly in the Tech Update forum. Find out what's where in the new Tech Update with our Guided Tour. Let the editors know what you think in the Mailroom.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

bordero

ike fuelband is great for every healthminded person ! to work out! theres this website called textme4free.com that you can use to text anywhere in...

2 hours ago by bordero on Nike's FuelBand wristband gamifies exercise
BrownieBoy

> I'm told it's somewhat annoying when people have their Macs stolen > and Apple stores treat the thief as the owner, but there you go. Ouch,...

5 hours ago by BrownieBoy on AMD Ultrathins to challenge Intel Ultrabooks
Moley

@kevinmchapman. OK, I acknowledge that 'most' was a gratuitous throwaway comment as an afterthought and too presumptuous. As to proof, as you...

9 hours ago by Moley on A tale of two distros: Ubuntu and Linux Mint
Jack Schofield

@BrownieBoy > Works really well for thieves.... >> Nice attempt to deflect the argument by tossing in a point that's totally >> irrelevant, even...

10 hours ago by Jack Schofield on AMD Ultrathins to challenge Intel Ultrabooks
raskolnikof

fantastic that the so called piracy bills have been withdrawn. however, these anti-democracy supporters are still in the shadows so lets be alert...

11 hours ago by raskolnikof on SOPA, Protect IP support wavers in face of online protest
Tony Douglas

Please God no; teach them anything you like - thinking rationally, the uses and misuses of data, what data is and what it's not - but leave the...

13 hours ago by Tony Douglas via Facebook on Kids are the future. Teach ’em to code.
BrownieBoy

@Jack, > Works really well for thieves.... Nice attempt to deflect the argument by tossing in a point that's totally irrelevant, even it were...

1 day ago by BrownieBoy on AMD Ultrathins to challenge Intel Ultrabooks
bootlegger

Make that 13 people now - I got refused today at Manchester airport. I thought I was up to date on this legislation - I knew of the EU ruling from...

1 day ago by bootlegger on UK airport body scans will not be opt out
tinycg

Don't forget to check out apps like GoodReader or SlideShark either, they're indispensible for people on the go in presentation situations. Best...

1 day ago by tinycg on Four top iPad apps for people on the move
TerryRK

Well it seems there is something a number of us agree on. Why is the Ubuntu Unity launcher so ugly? I thought perhaps it was something to do with...

2 days ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Freebies202

Duplicate comments are not made intentionally. Its very good to know that now you are keeping check on this problem because sometimes a commenter...

2 days ago by Freebies202 on Microsoft fixes blog comments, speeds up blogs with open source
kevinmchapman

"the very significant number of users" and "many (most) of us" - you have no evidence for these statements. It is a fact that most users are saying...

2 days ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
Marg Menzies Harrison

Another grammar faux pas is the improper use of "you". When sitting down down in a restaurant, for example, I get cringe when the waitress...

2 days ago by Marg Menzies Harrison via Facebook on 10 flagrant grammar mistakes that make you look stupid
zdnetukuser

And NOW, folks, for Canonical's next trick... Kubuntu is late. Here's a pencil. Draw your own conclusions. cf.:...

2 days ago by zdnetukuser on Linux Minterface
Moley

@kevinmchapman. The discussion here reflects the very significant number of users who really do like the traditional menu system and who wish to...

2 days ago by Moley on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

Er, no... It is an efficient means of finding the application/file/setting you need in one place. The icons are a simply a fallback for when you...

3 days ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

Isn't the provision of a text based search an admission by the developers that the mass of icons approach does not work? I don't need to use a...

3 days ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

"Unity and GNOME 3 both abandon the old text-based cascading menus in favour of a graphical icon-driven system." Point truly missed. Both use a...

3 days ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

whs001 - Thank you, I'm glad you liked the article. I absolutely agree with you on your first point. I should perhaps have made it clearer that...

3 days ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Dennis Nilsson

If we allow corporate interest to dictate the way our government circumvents due process against foreign entities then we should accept the same...

3 days ago by Dennis Nilsson via Facebook on ACTA stumbles in Germany