Anatomy of a server-room meltdown

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

CASE STUDY
The following story is a cautionary tale for anyone who runs a server room.

Back in June, the UK experienced its first hot weekend of the year. One IT manager, who asked to remain anonymous in return for sharing the litany of horrors that followed that weekend - but we'll call him Bob - spent Saturday and Sunday, like most people, enjoying the sunshine. Like most IT managers Bob carries a phone, to which his monitoring systems send text messages should anything go wrong in the server room. On this particular weekend, like most others, there were no text messages warning of any problems, and Bob spent a relaxing couple of days in the sun, safe in the knowledge that the servers back at work were humming quietly away.

Bob's weekend was only spoilt slightly on Sunday evening when he tried to log onto his corporate email account but couldn't connect for some reason. Never mind, he thought, a switch must have failed. It will just need a quick reboot in the morning.

How wrong he was.

"I turned up to work on Monday morning," says Bob, "to find the whole comms room had gone down. When I opened the door the temperature was about 45 degrees (Celsius)."

When the temperature in a comms room reaches that level, there is only one explanation: the aircon has failed. "We had two units, which we thought provided redundant air conditioning," says Bob. "But when one seized the second one was unable to cope with the load and so that one shut down too."

As if that wasn't bad enough, in the building where Bob's company is located, the main air conditioning is shut down at weekends to save money. Even in the winter, the offices can be pretty warm first thing on a Monday morning; in the summer they're stifling. So just imagine what it's like in a nicely insulated room with several dozen email, Web and application servers churning out many hundreds of Watts. As Bob put it, "The trouble with comms rooms is that when you switch the aircon off, they stop being a cool room and turn into an oven."

Obviously one of Bob's first jobs on Monday was to bring the temperature back down. The other, less obvious job (to anyone who has never had an aircon unit fail) was to start mopping up. "When the aircon swithced off," says Bob, "moisture condensed in the pipes that lead to the units on the roof." As this moitsure condensed, there was only one place for it to go: down the pipes, through the vents and onto the server-room floor. As for the temperature, says Bob: "On the Monday morning, we restarted the one working air conditioner and that began to have an effect. Then we looked for the cause of the equipment shutting down -- it turned out that the UPS had reached its critical temperature and powered down to protect itself."

There were actually two UPSes - one main one and a second, smaller one, for the monitoring system. The smaller of the two should survive at least 20 minutes after any power failure to send out text messages to support staff. This did not happen. The smaller UPS did not have a thermal shut-down - instead, it just fried.

By the end of the day the single aircon unit had brought the temperature back down, the IT team thought they were ok, and that they could survive on the single unit for the short-term. After all, all they needed to do was call the aircon engineer and everything would be hunky-dory.

But life, as most of us know, is rarely that simple.

Read on to find out what went wrong next.

Talkback

Advocates of business grid computing should learn something from this story.

via Facebook 3 August, 2004 19:42
Reply

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

PatrickG

openhgs has made the point for Windows 8 multiple monitors without realising it! With Windows 7 you have to switch the mouse and so your focus...

1 hour ago by PatrickG on Windows 8 could speed multi-monitor uptake
Leslie Satenstein

Mozilla has threatened to stop supporting Linux. I guess that UBUNTU is going with another browser. I indicated that if Mozilla stops supporting...

2 hours ago by Leslie Satenstein via Facebook on Firefox rapid release improves Fedora Linux
Andy Bolstridge

Much as I abhor Microsoft's licensing practices, this is almost certainly down to purchasing IT equipment via 3rd party consultants - you get the...

3 hours ago by Andy Bolstridge via Facebook on 6 million wasted licences and £1,200 PCs: welcome to government IT
Jack Schofield

@openhgs Windows users have had multiple desktops since Linus started writing Linux. They just haven't shipped as standard because not enough...

19 hours ago by Jack Schofield on Windows 8 could speed multi-monitor uptake
Jack Schofield

@Phil at Cloud4 What, Microsoft gets £1,200 per PC and £1,622 per server? Gosh, I'm amazed....

19 hours ago by Jack Schofield on 6 million wasted licences and £1,200 PCs: welcome to government IT
craigsc

You guys have no idea what is going on at Autonomy. Autonomy could have been a much more profitable organization. The sales operations at Autonomy...

21 hours ago by craigsc on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Moley

How does this impact on dual or multi booting? Seems to me to more or less prohibit this, from Windows 8 anyway. Will Grub 2 recognise Windows 8,...

21 hours ago by Moley on Windows 8 start-up speed forces USB boot workaround
apexwm

I don't understand why there cannot be a slight pause during the boot process so the user can press a key. Many operating systems do this, even if...

22 hours ago by apexwm on Windows 8 start-up speed forces USB boot workaround
Gavin Goodman

You can now buy the Xi3 modular computer in the UK at http://www.ocdistribution.com . This can be bought with the Tand3m software, pricing and...

23 hours ago by Gavin Goodman on CES 2012: Xi3 microSERV3R
Phil at Cloud4

I agree: Mike Lynch can clearly build a business and manage strategy. I suspect the exit of Mike is more likely the end of a planned handover...

1 day ago by Phil at Cloud4 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Phil at Cloud4

This is unbeleivable government wastage with only one winner... Microsoft 1 - Tax payer Nil!

1 day ago by Phil at Cloud4 on 6 million wasted licences and £1,200 PCs: welcome to government IT
Mispam

So what do you do when you can't boot into windows? Why can't I just hold Shift while I power up instead of having to boot into windows and click a...

1 day ago by Mispam on Windows 8 start-up speed forces USB boot workaround
apexwm

I've also seen that Mac OS X for Intel machines is supposed to run in VirtualBox, which would also be a nice solution. I've never tried it though.

1 day ago by apexwm on xTreme Triple Booting: Linux, Mac & Windows
dave heasman

What I wonder is why when companies are caught bang to rights in not providing contracted services, people bend over to smear the customers? Surely...

1 day ago by dave heasman on Virgin throttles broadband for high-speed customers
pjc158

Strange statement from HP regarding Mike Lynch and not capable of scaling a company. Autonomy was a $7bn purchase which started as a small company...

1 day ago by pjc158 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
lojolondon

Or - possibly, they will destroy business by ensuring people do not invest where there is no return. Another socialist idea, well beyond it's...

1 day ago by lojolondon on Open Data Institute will act as biz incubator
J.A. Watson

Good stuff Jake, very interesting. Thanks. jw

1 day ago by J.A. Watson on xTreme Triple Booting: Linux, Mac & Windows
openhgs

"the cost of a second LCD screen is about the same as one day of an office worker's time, so this should soon be recouped in extra productivity."...

1 day ago by openhgs on Windows 8 could speed multi-monitor uptake
Thomas Gellhaus

I also installed the KDE version; I also will probably try out razorqt since I really haven't had a chance to before. I'm looking forward to the...

2 days ago by Thomas Gellhaus via Facebook on Mageia 2 Released
francisabigail

Acquiring when reinvention/cannibalization is too challenging for a large organization can be an excellent strategy- still, so many mergers stumble...

2 days ago by francisabigail on Ariba buy parks SAP on Oracle's cloud turf