Picking apart RAID

Why use parity at all?
It may seem that parity can make things more complicated with a RAID array, so why not just stick with something like RAID 0 or RAID 1 and leave parity out of the equation? For starters, RAID 0 gives no fault tolerance, so it's not suitable for high-availability environments. RAID 1 doesn't use parity and is very inefficient with its use of disk space, because it requires a full 50 percent of the available storage since the data is simply mirrored. Using parity and RAID 3, 4, or 5, you can create a highly available disk array that can tolerate the loss of one of the disks. The data can be rebuilt using the parity information stored in the array, and these RAID levels make much more efficient use of the available disk space. What happens when parity goes bad?
With a single drive failure under any of RAID levels 1, 3, 4, 5, or 6, the failed drive can be replaced. The RAID array controller will automatically regenerate the data on the new drive using the parity information from the other drives and restore fault tolerance to the entire array. Although RAID provides an extra level of protection in the event of drive failure, parity errors can crop up. When you encounter a parity error, it indicates there is bad data on the drive. If the data cannot be corrected, you may need to load the data off to a backup tape. You know that the data cannot be corrected if you try to open a file or run an application that attempts to read that particular portion of the disk, but the file will not open, or the application crashes or doesn't run at all. In many instances, you will be notified via an error message that there was a problem reading from the disk. Often, the problem will become evident during the system backup, when all of the data on the disk is read in one sweep. In a RAID array, when a parity error is detected, the source data is reread to try to get it right. With or without RAID, parity errors can be generated because of a number of factors other than a failed disk. For example, parity errors may occur if the drive cables aren't properly connected or shielded, or if the wrong type of cable is being used to connect the disks to the controller. If you notice a significant number of parity errors, try swapping the cables and testing the controller card to make sure it hasn't gone bad. Also, check the SCSI terminators to see if one may have come loose. Most RAID controllers come with diagnostics programs to do some of the troubleshooting, so be sure to make good use of any of these packages, too. You should also investigate the physical connections to your SCSI devices to determine if they may be the source of the parity problems. First, make sure you're using the right SCSI cable. Ram Electronics has pictures of many common SCSI connectors and the SCSI Trade Association (STA)-endorsed terms and specifications for each type of connector. Most internal SCSI cables are of the ribbon variety, with any number of individual wires running through the ribbon. If even one of those wires is exposed, shorting out, cut, or not fully attached to the connector on the end, it may create data transfer problems. Make sure the SCSI cable is properly connected to both the controller card and the drive, and that the pins on the devices line up with the pins on the SCSI connector. Testing a controller card is a little more difficult. The easiest way is to use the diagnostic program that comes with many SCSI and RAID adapters. During system installation for certain servers, such as those from Dell and Compaq, utilities are written to a small partition on a disk array. Among these utilities are programs that test the array controller. You can run these programs at system boot time by pressing a key combination on the keyboard, which interrupts the boot process and instead runs the system utilities. Newer systems also include Windows-based array utilities that perform many of the same functions. Dell, for example, includes its Array Manager product for servers shipping with an array controller, which you can install with the rest of the system management suite. A second controller testing method involves moving the controller to another machine and testing it with different hardware. This isn't the preferable method, because it could result in more downtime and assumes that you have spare hardware lying about that you can use to test the hardware. How does the parity become corrupted?
A number of issues could cause the corruption of parity on a disk, including:
  • System crashes: When a system crashes, any data not written to the disk is lost. In the event that data was being written to a RAID array, it is possible that either the data or the parity was written to disk, but not both. In a situation such as this, you can't rely on the parity to reconstruct the data on the disk. Reducing the number of system crashes by making use of UPS units and redundant power supplies will help protect against this type of parity corruption.
  • Uncorrectable bit errors: A hard disk in an array is nothing more than a bunch of magnetic bits that gradually lose the ability to hold data over time. Eventually, bit errors are detected when an attempt is made to read data back from the drive. Many RAID arrays use embedded software that monitors the individual disks and informs an operator when a disk may be about to fail. When I am informed of an impending disk failure, I generally run a diagnostic on the RAID array to make sure the controller is working properly and verify that the error message was correct. If the verification comes back with a problem, I either replace the RAID card -- which rarely happens -- or replace any drives that the diagnostics identified as bad.
  • A disk failure: Like a system crash, a disk failure can have a negative impact on parity. Disks can fail for a variety of reasons: age, overuse, excessive powering up and down, or power surges. When a disk in an array fails, replace it immediately and run a diagnostic on the array. A single disk failure may indicate more failures to come.
  • Other possible causes: If the array checks out okay and the cables have been tested, the power supply in the system may be delivering too much power to a disk in the array, causing parity problems. You can test for such an issue with a voltmeter -- but be careful, because electrocution is always a possibility when working with a voltmeter. First, disconnect the system from the power source and insert the probes of the voltmeter into the socket. Next, verify the output against the local standard (110 to 120 volts in North America). Once you plug the system back in to the wall, you can disconnect the drive array from the power supply and use the voltmeter to test the individual power leads in the same way. Exact power specifications for the leads can be found in the system guide or on the manufacturer's Web site.
Additional reading
Luckily, most of today's RAID and SCSI controllers are very good about making sure parity errors are not introduced onto the disk. However, if this does happen, follow the suggestions above to minimise the risk of data corruption and failure. If you aren't using a parity-enabled RAID scheme on a mission-critical system, do a cost/benefit analysis and get RAID installed. It will be worth much more than the cost of a disk failure. An excellent discussion of RAID advantages and disadvantages can be found at Advanced Computer & Network Corporation's RAID.edu Web site.
Have your say instantly in the Tech Update forum. Find out what's where in the new Tech Update with our Guided Tour. Let the editors know what you think in the Mailroom.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

apexwm

All of the feedback regarding using a touch monitor for a desktop PC is right on. Several months ago, we installed a "demo" multitouch all-in-one...

2 hours ago by apexwm on Windows 8 could speed multi-monitor uptake
191706

anyone wanting to triple boot *their* own Mac

4 hours ago by 191706 on xTreme Triple Booting: Linux, Mac & Windows
SoapyTablet

Cont.. Biggest Bugbear: Win7's stop-animate-go approach to work, you develop a staggered (not in the above alchohol sense of the word) approach to...

4 hours ago by SoapyTablet on Windows 8 could speed multi-monitor uptake
SoapyTablet

Ah the joys of Windows 8 Consumer Preview... If Windows 7 was 'Vista with Lipstick', whats Windows 8? Vista with Lipstick, the morning after?...

4 hours ago by SoapyTablet on Windows 8 could speed multi-monitor uptake
daveveej

Though the metro look is quite cool on the windows mobile platform I think that think that microsoft ARE MESSING THINGS UP because what has they...

4 hours ago by daveveej on Windows 8 could speed multi-monitor uptake
Custonian

I agree, we have a few touch screen monitors in work but as Windows7 and the applications we use are not touch screen friendly (the size of the...

5 hours ago by Custonian on Windows 8 could speed multi-monitor uptake
archerthom

I find it amusing that Microsoft added the mouse, which was deemed awkward, but people were forced to use it so it stuck, and now they're saying,...

7 hours ago by archerthom on Windows 8 could speed multi-monitor uptake
BrownieBoy

Agree with other comments. Nobody's going to start reaching out to start tapping their desktop monitors with their fingers. Their arms would tire...

16 hours ago by BrownieBoy on Windows 8 could speed multi-monitor uptake
Random_Error

The only way a touch monitor would be any good is if it were horizontal on the desk, with a virtual keyboard so you could do away with that as well...

21 hours ago by Random_Error on Windows 8 could speed multi-monitor uptake
JBDragon

This is just dumb! Forget that I think Windows 8 will bomb, but really, people are going to go out and buy touch Monitors now??? Just pretend...

23 hours ago by JBDragon on Windows 8 could speed multi-monitor uptake
Jake Rayson

@Andy Bolstridge > Unfortunately, we need the majority to work 9-5 And therein lies the lie. I work very hard indeed for my idleness, early starts...

1 day ago by Jake Rayson on The Idle Self-employed
Burn-IT

What happens when one hosting platform "acquires data" from another? If I forced the first one to remove it, who is responsible for chasing the...

1 day ago by Burn-IT on Google picks holes in EU's 'right to be forgotten'
JohnTalich

iSpring Pro is a nice tool, that allows PowerPoint to SCORM conversion. They also have free tool, that also generates SCORM compliant courses.

1 day ago by JohnTalich on How To Convert PowerPoint To SCORM Compliant Course
aaron.sloman

I think the answer to the question requires a deeper analysis of where the income can come from who else is now competing for it, who else will be...

2 days ago by aaron.sloman on The three big questions about Facebook's IPO
Brent Pieczynski

Your correctness about Government websites not being compliant with their own websites is correct. Most criticism of other people takes so many...

2 days ago by Brent Pieczynski on Privacy watchdog to chase big companies over cookie law
Kelvyn Taylor

802.11ac does promise some tricks to improve range & reliability, but not sure how these will work in practice until I get real products to play...

2 days ago by Kelvyn Taylor via Facebook on Next-generation 802.11ac routers
mrudang009

My wife and I love our new Kindle Fire. It's lightweight, easy to use and has a great interface. The first thing I recommend anyone with a new...

2 days ago by mrudang009 on Waterstones to sell Kindles with in-store offers
mrudang009

It basically unlocks all the Android marketplace apps and unlocks the device. I am one very happy Kindle owner!

2 days ago by mrudang009 on Waterstones to sell Kindles with in-store offers
Burn-IT

Skittles with tapes and coffee cups. Old tapes so we didn't have to rewind them afterwards.

2 days ago by Burn-IT on Ten IT jobs to save up for those rare lulls
Fraud_fighter

What is mildly amusing to me is when someone thinks a strong password is as strong as one may need, when the truth is usernames and passwords are...

2 days ago by Fraud_fighter on Passwords are here to stay: get used to it