In this feature, I want share with you an incident that took place in my office. While this incident did not become a significant problem, it did point out the need to periodically verify the configuration of our iSCSI-based storage network.
Last March, I implemented an EqualLogic PS200E iSCSI storage array in my data centre. Connecting the various servers to the storage array is a pair of HP ProCurve 2848 switches. These switches sport 48 10/100/1000 Ethernet ports and were cabled exactly to EqualLogic's specifications.
When we initially installed the array with our switches, we implemented flow control and jumbo frames wherever possible. While jumbo frames don't provide the actual iSCSI performance boost that flow control does, the use of jumbo frames does ease some burden on each supported server, since there are fewer iSCSI packets to package up. Flow control provides the real performance gain, with jumbo frames making up the last mile. For the past 18 months, things have hummed along, with only minor problems here and there.
Alarm bells
Last week, a combination of firmware revision levels and hard drive types in our SAN led to a situation in which the SAN's performance level dropped to a level that was noticeable across the organisation. The help desk went nuts while IT staff began troubleshooting. The problem ended up being a bad hard drive that was not showing as bad due to a bug in the firmware code, but the situation and the resulting call with tech support highlighted the importance of revisiting the storage configuration from time to time.
During the call with tech support, it became apparent that, since our initial implementation, EqualLogic has learned a lot about various network devices and has refined their recommendations. During our chat, EqualLogic recommended that we disable the jumbo frames feature and use only flow control for the particular model of switch that we are using. The reason is that the HP 28xx series of switches doesn't have much buffer memory, and trying to use both flow control and jumbo frames on these switches can lead to communications problems.
At the time of our installation, EqualLogic also recommended, for full redundancy, two separate switches, and provided a cabling diagram for getting the best results in the event of hardware failure somewhere in the chain. EqualLogic also recommended connecting the two switches together with an uplink cable, so communication would keep flowing, regardless of what hardware failed.
In our call this week, EqualLogic amended this recommendation by indicating that we should bond two channels together on the switches so that, in the event of a failure, communication could be maintained at full speed. The PS series of arrays have 3GB Ethernet ports on two controllers for a total of six ports. From each controller, two of the three ports are cabled to one switch and the third to the second switch. The second control is cabled the same way, but with two connections to the opposite switch from the first controller. What this means is that there could be 2GB worth of traffic trying to get through that uplink between the switches, so it's important to make sure enough bandwidth is there.
Summary
Some of you may read this article and wonder why we didn't check these things before. This was truly a case of "set it and forget it". Since the storage network was working fine and we had a ton of other projects, we were only doing regular firmware updates, but hadn't followed up on possible changes in configuration recommendations. Now that we've updated the configurations, however, we have a situation that is more stable and more resilient in the event of a failure.
The lesson for us: schedule time to review our storage infrastructure and make sure we're running with the latest recommendations for best performance and stability. Of course, this is applicable to all services!
Story URL: http://news.zdnet.co.uk/hardware/0,1000000091,39283709,00.htmCopyright © 1995-2008 CNET Networks, Inc. All rights reserved
ZDNET is a registered service mark of CNET Networks, Inc. ZDNET Logo is a service mark of CNET Networks, Inc.