How outage triggered mayhem for hospital datacentre

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

ANALYSIS

On 20 May, a brief electricity failure struck a datacentre run by Queensland Health in Australia, starting a chain of incidents that resulted in serious outages of over 20 health applications.

The datacentre, located on the campus of Herston hospital, is believed to be one of three datacentres operated by Queensland Health. It lost power for only a fraction of a second, when two flooded Energex transformers failed at around 5pm on that day, according to a source close to the incident. Uninterrupted power supplies kicked in to keep servers up.

However, the power cut tripped the chilled water system, cutting chilled water to the hospital campus. As it was not monitored, the datacentre support team did not notice the loss of the chilled water. A datacentre employee came on scene to check everything was running, but being happy there nothing wrong, he left.

Only two of 10 air-conditioning units within the datacentre were able to use refrigerated gas if chilled water was unavailable, meaning that although the rest of the units were operating, they were not cooling. The temperature in the datacentre began to rise.

No messages
Although people were called in to investigate the temperature rise, the cool water problem was not found. Due to a DNS change the day before the problems began, there were no messages being sent to tell staff of server problems. Four hours after the power cut, services began to suffer. On-call hospital staff were affected and complained. Soon after, a server shut down.

The whereabouts of the air-conditioning specialist who had been called in was unknown to many staff members and he did not answer his phone. It had taken the engineer three hours to arrive on site. Five hours after the systems failed, the fact that the chilled water pumps had not been operating was discovered as more servers shut down with temperatures over 50°. It was believed to be fixed.

Because the remote-access system was not working, staff had to wait until they arrived at the datacentre before they could begin shutting down servers. When they arrived, they started to move systems over to an alternative datacentre, which in some cases caused brief user inconvenience. Some, however, could not be moved since their servers had no ability to failover and Queensland Health's architecture for virtual machines did not allow moving it over to a second datacentre.

The hospital's Cerner electronic medical record (patient administration) system was shut down by the hospital staff.

Six hours after the power cut, the air conditioning was still not working. Although staff believed they had found the problem, more systems shut down, until 75 percent of applications were down and the datacentre reached 45°.

Eight hours after the power cut, chilled water was finally brought back up. Nine hours after, the datacentre was back to normal and the services could be restored. By 9am the morning after the power cut, all services were restored.

Read this

Comment: Time lawyers got to grips with encryption

Encryption is playing an increasingly important role, but in law its status is poorly defined. It's time that changed, says Jeremy Phillips

Read more +

Over the course of the problems, 12 applications caused a significant impact, with another 12 having a minor impact. Three years ago the datacentre was forced to shut down for the same reasons. Afterwards, the team had been told it could not happen again.

When queried on the incident, Queensland Health acting chief information officer Ray Brown did not respond to a question on which facilities around Queensland the applications provided services to. However, it is believed that Queensland Health's three datacentres provide services around the state to multiple locations.

Brown denied there had been more than one incident over the past three years at the datacentre.

'Lessons learned'
According to Brown, since several applications were relocated to the other datacentre, there was "minimal disruption" to services. "The majority of services impacted were available by 2:30am and all Queensland Health systems categorised as critical remained operational during this incident," he said.

"In the face of a severe weather event, the IT staff involved were outstanding in their response to minimise the impact of this incident. The ability of staff to physically attend the site was severely hampered by flooding in the area."

Lessons had been learned, according to Brown. Queensland Health was exploring options to remove reliance on chilled water. It also intended to replace the remote-access system by the third quarter of this year. It is undertaking a review of management tools and is examining the crisis-management plan.

Queensland Health has lost several chief information officers over the past several years. Long-time chief information officer Paul Summergreene had his contract terminated by the department in July 2008. Dr Richard Ashbury filled his shoes for a short time, before leaving the chair vacant, with Brown currently leading the department's IT function in an acting capacity.

The news also comes as the Queensland government flagged in the last state budget its intent to spend hundreds of millions of dollars on health IT systems to support its e-health capability.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

txtrainguy

Replying to an old topic that I'm currently facing with my CEO (who is on a Mac). Our servers are primarily Windows Servers, office is about...

5 hours ago by txtrainguy on Windows Server 2008 drops the ball for Mac compatibility
k0tcs3

Sure, that makes perfect sense. Pay wrong-doers money and thank them for breaching your security and pointing out your flaws, that would surely...

5 hours ago by k0tcs3 on US indicts Romanian over NASA climate change hack
Random_Error

I think he's referring specifically to Android apps, as Apple do regulate their App Store, but Google seem to let any old crap onto the Android store!

5 hours ago by Random_Error on RIM: BlackBerry will keep 'garbage' apps out of store
Paul Fezziwig

Keep the crap apps out?! How will they compete with Android and Apple's claim to fame of having so many life changing apps? I wonder if the media...

11 hours ago by Paul Fezziwig via Facebook on RIM: BlackBerry will keep 'garbage' apps out of store
Aigars Mahinovs

It has been shown time after time that if there is an author store that sells the songs at even 1$ per song and gives you a high-quality digital...

12 hours ago by Aigars Mahinovs via Facebook on Copyright isn't working, says European Commission
awbMaven

""As a result of Butyka's alleged conduct, researchers were unable to use the computers for more than two months while NASA removed the malicious...

14 hours ago by awbMaven on US indicts Romanian over NASA climate change hack
subhorup

It simultaneously worries me and uplifts me that a self-proclaimed group of internet activists name themselves after Indian mythical figures....

22 hours ago by subhorup on Anonymous activists release PCAnywhere source code
naviathan

It's actually far easier to work anonymously on the internet than you think. With tools like Tor bouncing your traffic around the world before...

1 day ago by naviathan on Anonymous activists release PCAnywhere source code
Agnostic_OS

1000272134 and bluedalmatian with you both there but then I'm still in 10.04 land (and happy with it)

1 day ago by Agnostic_OS on Ten factors that make Ubuntu 11.10 a hit
apexwm

Interesting article and definitely see your points on the products mentioned. One of the top products for our Help Desk (approximately 20% of all...

1 day ago by apexwm on Ten flawed products that derail productivity
Paul Hutchinson

Absolutely - this should obviously not be handled my isp - but handled by their hosting operator. What's been suggested here is that my isp police...

1 day ago by Paul Hutchinson via Facebook on MPs urge ISPs to take down terrorist material
Techs UK

Looks like a great phone. I don't notice any deficiencies in WP7. used IOS before, that's pretty good. I don't spend much time in Apps, all i need...

2 days ago by Techs UK on Nokia pins US 're-entry' hopes on Lumia 900
Larry Bloggy

Now with the help of these apps you are always synced with MS outlook while on the move. Just download apps like xobni or outlookreflex and get...

2 days ago by Larry Bloggy via Facebook on Outlook Social Connector beta 2 and the LinkedIn connector
mike40g123

Your details are wrong. The version currently being made is the one with 2 USB ports, 256MB RAM and a network port. This is the Model B. The...

2 days ago by mike40g123 on Raspberry Pi boards set to go on sale
Moley

The thing that has been puzzling me for quite a while is how Anonymous can remain anonymous whilst not only being active on the Internet but also...

2 days ago by Moley on Anonymous activists release PCAnywhere source code
Don Dilly

If what Semantec is saying is rue, that is even worse and shows a complete disregard for thier users. If what Anonymous claims is true and the...

2 days ago by Don Dilly via Facebook on Anonymous activists release PCAnywhere source code
MattChurchy

Didn't seem particularly biased to me either. Oh though you might have mentioned some other competitors with free search and email services...

2 days ago by MattChurchy on Time for an evil umpire: Google, Microsoft & privacy
Simon Bisson and Mary Branscombe

James - exactly as much as anyone paid you for your comment; I don't feel that I need to say that I'm independant and unbiased, but just for you...

3 days ago by Simon Bisson and Mary Branscombe on Time for an evil umpire: Google, Microsoft & privacy
Carl White

Once they realise symantec are willing to pay real money, they will simply keep extorting, unless of course symantec/authorities can use the...

3 days ago by Carl White via Facebook on Symantec offered hackers $50k in source code sting
Jonathan Hassell

You can find more information on BS 8878 by Jonathan Hassell its lead-author at http://www.hassellinclusion.com/bs8878/ The page includes a...

3 days ago by Jonathan Hassell on BSI publishes first British web accessibility standard