The top 10 IT disasters of all time

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

Topics

Disasters

NEWS

Following the loss of the personal records of some 25 million child benefit recipients by Her Majesty's Revenue & Customs this month, the UK government will be acutely aware of how quickly mismanagement of technology can lead to serious problems.

While technology wasn't to blame per se in the HMRC data loss, there are plenty of recorded examples where faulty hardware and software have cost the organisations concerned dearly, both financially and in terms of reputation — and resulted in some near misses for the public.

Here's our considered list of some of the worst IT-related disasters and failures. The order is subjective — with number one being the worst — so feel free to comment using the Talkback facility below if you disagree or have suggestions for disasters we may have missed.

1. Faulty Soviet early warning system nearly causes WWIII (1983)
The threat of computers purposefully starting World War III is still the stuff of science fiction, but accidental software glitches have brought us worryingly close in the past. Although there are numerous alleged events of this ilk, the secrecy around military systems makes it hard to sort the urban myths from the real incidents.

 However, one example that is well recorded happened back in 1983, and was the direct result of a software bug in the Soviet early warning system. The Russians' system told them that the US had launched five ballistic missiles. However, the duty officer for the system, one Lt Col Stanislav Petrov, claims he had a "...funny feeling in my gut", and reasoned if the US was really attacking they would launch more than five missiles.

The trigger for the near apocalyptic disaster was traced to a fault in software that was supposed to filter out false missile detections caused by satellites picking up sunlight reflections off cloud-tops.

2. The AT&T network collapse (1990)
In 1990, 75 million phone calls across the US went unanswered after a single switch at one of AT&T's 114 switching centres suffered a minor mechanical problem, which shut down the centre. When the centre came back up soon afterwards, it sent a message to other centres, which in turn caused them to trip and shut down and reset.

The culprit turned out to be an error in a single line of code — not hackers, as some claimed at the time — that had been added during a highly complex software upgrade. American Airlines alone estimated this small error cost it 200,000 reservations.

3. The explosion of the Ariane 5 (1996)
In 1996, Europe's newest and unmanned satellite-launching rocket, the Ariane 5, was intentionally blown up just seconds after taking off on its maiden flight from Kourou, French Guiana. The European Space Agency estimated that total development of Ariane 5 cost more than $8bn (£4bn). On board Ariane 5 was a $500m (£240m) set of four scientific satellites created to study how the Earth's magnetic field interacts with Solar Winds.

According to a piece in the New York Times Magazine, the self-destruction was triggered by software trying to stuff "a 64-bit number into a 16-bit space".

"This shutdown occurred 36.7 seconds after launch, when the guidance system's own computer tried to convert one piece of data — the sideways velocity of the rocket — from a 64-bit format to a 16-bit format. The number was too big, and an overflow error resulted. When the guidance system shut down, it passed control to an identical, redundant unit, which was there to provide backup in case of just such a failure. But the second unit had failed in the identical manner a few milliseconds before. And why not? It was running the same software," the article stated.

4. Airbus A380 suffers from incompatible software issues (2006)
The Airbus issue of 2006 highlighted a problem many companies can have with software: what happens when one program doesn't talk to the another. In this case, the problem was caused by two halves of the same program, the CATIA software that is used to design and assemble one of the world's largest aircraft, the Airbus A380.

This was a major European undertaking and, according to Business Week, the problem arose with communications between two organisations in the group: the French Dassault Aviation and a Hamburg factory.

Put simply, the German system used an out-of-date version of CATIA and the French system used the latest version. So when Airbus was bringing together two halves of the aircraft, the different software meant that the wiring on one did not match the wiring in the other. The cables could not meet up without being changed.

The problem was eventually fixed, but only at a cost that nobody seems to want to put an absolute figure on. But all agreed it cost a lot, and put the project back a year or more.

5. Mars Climate Observer metric problem (1998)
Two spacecraft, the Mars Climate Orbiter and the Mars Polar Lander, were part of a space programme that, in 1998, was supposed to study the Martian weather, climate, and water and carbon dioxide content of the atmosphere. But a problem occurred when a navigation error caused the lander to fly too low in the atmosphere and it was destroyed.

What caused the error? A sub-contractor on the Nasa programme had used imperial units (as used in the US), rather than the Nasa-specified metric units (as used in Europe).

6. EDS and the Child Support Agency (2004)
Business services giant EDS waded in with this spectacular disaster, which assisted in the destruction of the Child Support Agency (CSA) and cost the taxpayer over a billion pounds.

EDS's CS2 computer system somehow managed to overpay 1.9 million people and underpay around 700,000, partly because the Department for Work and Pensions (DWP) decided to reform the CSA at the same time as bringing in CS2.

Edward Leigh, chairman of the Public Accounts Committee, was outraged when the National Audit Office subsequently picked through the wreckage: "Ignoring ample warnings, the DWP, the CSA and IT contractor EDS introduced a large, complex IT system at the same time as restructuring the agency. The new system was brought in and, as night follows day, stumbled and now has enormous operational difficulties."

7. The two-digit year-2000 problem (1999/2000)
A lot of IT vendors and contractors did very well out of the billions spent to avoid what many feared would be the disaster related to the Millennium Bug. Rumours of astronomical contract rates and retainers abounded.

And the sound of clocks striking midnight in time zones around the world was followed by... not panic, not crashing computer systems, in fact nothing more than new year celebrations.

So why include it here? That the predictions of doom came to naught is irrelevant, as we're not talking about the disaster that was averted, but the original disastrous decision to use and keep using for longer than was either necessary or prudent double digits for the date field in computer programs. A report by the House of Commons Library pegged the cost of fixing the bug at £400bn. And that is why the Millennium Bug deserves a place in the top 10.

8. When the laptops exploded (2006)
It all began simply, but certainly not quietly, when a laptop manufactured by Dell burst into flames at a trade show in Japan. There had been rumours of laptops catching fire, but the difference here was that the Dell laptop managed to do it in the full glare of publicity and video captured it in full colour.

(Unfortunately, the video capturing the incident appears to have vanished from the web. If you happen to own a copy, please send it to us as it should make interesting viewing again.)

"We have captured the notebook and have begun investigating the event," Dell spokeswoman Anne Camden reported at the time, and investigate Dell did. At the end of these investigations the problem was traced to an issue with the battery/power supply on the individual laptop that had overheated and caught fire.

It was an expensive issue for Dell to sort out. As a result of its investigation Dell decided that it would be prudent to recall and replace 4.1m laptop batteries.

 Company chief executive Michael Dell eventually laid the blame the for the faulty batteries with the manufacturer of the battery cells — Sony. But that wasn’t the end of it. Apple reported issues for iPods and Macbooks and many PC suppliers reported the same. Matsushita alone has had to recall around 54 million devices. Sony estimated at the time that the overall cost of supporting the recall programmes of Apple and Dell would amount to between ¥20bn (£90m) and ¥30bn

9. Siemens and the passport system (1999)
It was the summer of 1999, and half a million British citizens were less than happy to discover that their new passports couldn't be issued on time because the Passport Agency had brought in a new Siemens computer system without sufficiently testing it and training staff first.

Hundreds of people missed their holidays and the Home Office had to pay millions in compensation, staff overtime and umbrellas for the poor people queuing in the rain for passports. But why such an unexpectedly huge demand for passports? The law had recently changed to demand, for the first time, that all children under 16 had to get one if they were travelling abroad.

Tory MP Anne Widdecombe summed it up well while berating the then home secretary, Jack Straw, over the fiasco: "Common sense should have told him that to change the law on child passports at the same time as introducing a new computer system into the agency was storing up trouble for the future."

10. LA Airport flights grounded (2007)
Some 17,000 planes were grounded at Los Angeles International Airport earlier this year because of a software problem. The problem that hit systems at United States Customs and Border Protection (USCBP) agency was a simple one caused in a piece of lowly, inexpensive equipment.

The device in question was a network card that, instead of shutting down as perhaps it should have done, persisted in sending the incorrect data out across the network. The data then cascaded out until it hit the entire network at the USCBP and brought it to a standstill. Nobody could be authorised to leave or enter the US through the airport for eight hours. Passengers were not impressed.

(Note: We have purposely omitted incidents that resulted in loss of life.)

Talkback

Covered in engineering ethics classes, radiation therapy machine kills patients because of software and hardware flaws.

http://en.wikipedia.org/wiki/Therac-25

grumpyoldgeek 22 November, 2007 20:38
Reply

It just really grates that the government and other big organisations put our security at risk! Why don't they invest more into security plus their software development and stop giving us all grief. How is the government ever going to win back our trust? I am not feeling very patriotic right now! I think this is the right time to name and shame and I think the top ten list should be extended to included some other blunders (I am keeping it clean) that I found here, check this out... http://www.origsoft.com/Reading_Room/nightmares.htm
This is a short list reminding us of huge companies failing to invest in their software development!

lovvella 23 November, 2007 09:12
Reply

I would point out the footnote to this piece that makes it clear we would not included incidents and issues that involved loss of life.
This point was in the last paragraph so it is understandable that it was not picked up.
Thanks for the feedback though and as I am unfamiliar with the incident, it is useful for future reference.
Colin Barker

Colin Barker 23 November, 2007 09:12
Reply

This post has been removed by a moderator.

"The Internet? We are not interested in it"
~ Bill Gates, 1993

"Anybody who thinks a little 9,000-line program
that's distributed free and can be cloned by anyone
is going to affect anything we do at Microsoft has
his head screwed on wrong."
~ Bill Gates In response to Java

Shannon McPherson 23 November, 2007 16:13
Reply

[Cited from Wikipedia]

The airport's computerized baggage system, which was supposed to reduce flight delays, shorten waiting times at luggage carousels, and save airlines in labor costs, turned into an unmitigated failure, and is widely given as a textbook example of a software engineering disaster. An opening originally scheduled for October 31, 1993 with a single system for all three concourses turned into a February 28, 1995 opening with separate systems for each concourse, with varying degrees of automation.

The system's $186 million in original construction costs grew by $1 million per day during months of modifications and repairs. Incoming flights never made use of the system, and only United, DIA's dominant airline, used it for outgoing flights. The 40-year-old company responsible for the design of the automated system, BAE Automated Systems of Carrollton, Texas, at one time responsible for 90% of the baggage systems in the U.S., was acquired in 2002 by G&T Conveyor Company, Inc.

The automated baggage system never worked well, and in August 2005, it became public knowledge that United would abandon the system, a decision that would save them $1 million in monthly maintenance costs.

1000189150 24 November, 2007 08:26
Reply

The story about the Mars probe reminded me of a similar story. Phobos 1. a Russian probe launched towards Mars in July 1988. The following lifted from Wikipedia.

---------

"Phobos 1 operated nominally until an expected communications session on 2 September 1988 failed to occur. The failure of controllers to regain contact with the spacecraft was traced to an error in the software uploaded on 29 August/30 August, which had deactivated the attitude thrusters. By losing its lock on the Sun, the spacecraft could no longer properly orient its solar arrays, thus depleting its batteries.
"A natural question is "Why would a spacecraft have instructions that turn off the attitude control, normally a fatal operation?" In this case, these instructions were part of a routine used when testing the spacecraft on the ground. Normally this routine would be removed before launch. However, the software was coded in PROMs, and so removing the test code required removing and replacing the entire computer. Because of time pressure from the impending launch, engineers decided to leave the command sequence in, though it never should be used. However, a single character error in constructing an upload sequence resulted in the command executing, with subsequent loss of the spacecraft."

----------

While I was looking this up, I also came across this story. The Genesis probe was a craft sent into space to collect minute particles of space dust and bring them back to earth. Sadly on return, the parachute failed to deploy and the capsule slammed into the Utah desert in a spectacular way. It turns out the accelerometers designed to detect the capsule entering the atmosphere were wired backwards. thus rendering them incapable of doing their job. The mission was doomed from the moment it left the ground.

julian 25 November, 2007 12:53
Reply

“Bill Gates follows someone’s taillights for a while and then zooms past. Soon there will be no taillights left.”
Andy Grove, 1994

Colin Barker 25 November, 2007 14:07
Reply

An important topic, a huge consortium.
More than one year delay.

I Teich 26 November, 2007 12:42
Reply

When you try to do a top ten you know you will not fit everything in but thanks for all the suggestions. I kick myself for missing out the Denver baggage mess. That is one computer disaster that can still make me smile years later - it helps that I did not travel through Denver much at the time. It does conjure up a wonderful picture in the mind of all those clever robotics and computers sending baggage to all the wrong places. A bit like British Airways as the moment, and they do it without the help of robots.
We will be thinking of using this useful data in a future piece. Could I Teich please send me more info on the Street Payment System. Thanks.
Colin Barker

Colin Barker 28 November, 2007 11:36
Reply

Really good lunchtime read Colin, thanks a lot. I can see you authoring a sucessful book of the top hundred, I don't mean one of those cheap bookclub jobs but one containing your in-depth analysis from your techhy, industry insider point of view.

Actually there could be a whole series of 'em (sucky Military IT projects, Healthcare IT disasters, the listgoes on !!!). What an industry we work in !!!!

Gren

gren 10 December, 2007 13:27
Reply

I remember that one - they invited an assembly of journalists to the airport for a demo of the 'finally worked-out system' and chaos ensued - suitecases coming off the tracks, clothing being sprayed into the air etc etc. Poor sods, a hard way to learn stuff

gren 10 December, 2007 13:34
Reply

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

Jason Burchell

seriously now. I've only bothered to read a small bit of the comments. do me and the rest of the world a favour. stop saying it does not work or...

2 hours ago by Jason Burchell via Facebook on Music industry negotiating over 24-bit downloads
Philip Charles Cohen

Read about it and weep, John Donahoe ... In addition to Visa’s V.me, there is now MasterCard’s PayPass digital wallet soon to arrive; another...

6 hours ago by Philip Charles Cohen via Facebook on PayPal takes phone-based payments to the high street
apexwm

Leslie Satenstein : Where have you ever seen Mozilla even mention this? Firefox is the most popular browser in the GNU/Linux OS, so I don't see...

7 hours ago by apexwm on Firefox rapid release improves Fedora Linux
songmaster

SHleG: Do you remember building a clockwork scorpion kit (I'm pretty sure I have a photo of it somewhere) — I think it was called something like...

8 hours ago by songmaster on Software with everything
Chris Wortman

Good I love Yahoo! Their search engine is getting better than Google as of late. I find more of what I want on the first page, and usually within...

9 hours ago by Chris Wortman via Facebook on Linux Mint 13 ramps up for KDE release
PatrickG

openhgs has made the point for Windows 8 multiple monitors without realising it! With Windows 7 you have to switch the mouse and so your focus...

11 hours ago by PatrickG on Windows 8 could speed multi-monitor uptake
Leslie Satenstein

Mozilla has threatened to stop supporting Linux. I guess that UBUNTU is going with another browser. I indicated that if Mozilla stops supporting...

12 hours ago by Leslie Satenstein via Facebook on Firefox rapid release improves Fedora Linux
Andy Bolstridge

Much as I abhor Microsoft's licensing practices, this is almost certainly down to purchasing IT equipment via 3rd party consultants - you get the...

12 hours ago by Andy Bolstridge via Facebook on 6 million wasted licences and £1,200 PCs: welcome to government IT
Jack Schofield

@openhgs Windows users have had multiple desktops since Linus started writing Linux. They just haven't shipped as standard because not enough...

1 day ago by Jack Schofield on Windows 8 could speed multi-monitor uptake
Jack Schofield

@Phil at Cloud4 What, Microsoft gets £1,200 per PC and £1,622 per server? Gosh, I'm amazed....

1 day ago by Jack Schofield on 6 million wasted licences and £1,200 PCs: welcome to government IT
craigsc

You guys have no idea what is going on at Autonomy. Autonomy could have been a much more profitable organization. The sales operations at Autonomy...

1 day ago by craigsc on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Moley

How does this impact on dual or multi booting? Seems to me to more or less prohibit this, from Windows 8 anyway. Will Grub 2 recognise Windows 8,...

1 day ago by Moley on Windows 8 start-up speed forces USB boot workaround
apexwm

I don't understand why there cannot be a slight pause during the boot process so the user can press a key. Many operating systems do this, even if...

1 day ago by apexwm on Windows 8 start-up speed forces USB boot workaround
Gavin Goodman

You can now buy the Xi3 modular computer in the UK at http://www.ocdistribution.com . This can be bought with the Tand3m software, pricing and...

1 day ago by Gavin Goodman on CES 2012: Xi3 microSERV3R
Phil at Cloud4

I agree: Mike Lynch can clearly build a business and manage strategy. I suspect the exit of Mike is more likely the end of a planned handover...

1 day ago by Phil at Cloud4 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Phil at Cloud4

This is unbeleivable government wastage with only one winner... Microsoft 1 - Tax payer Nil!

1 day ago by Phil at Cloud4 on 6 million wasted licences and £1,200 PCs: welcome to government IT
Mispam

So what do you do when you can't boot into windows? Why can't I just hold Shift while I power up instead of having to boot into windows and click a...

2 days ago by Mispam on Windows 8 start-up speed forces USB boot workaround
apexwm

I've also seen that Mac OS X for Intel machines is supposed to run in VirtualBox, which would also be a nice solution. I've never tried it though.

2 days ago by apexwm on xTreme Triple Booting: Linux, Mac & Windows
dave heasman

What I wonder is why when companies are caught bang to rights in not providing contracted services, people bend over to smear the customers? Surely...

2 days ago by dave heasman on Virgin throttles broadband for high-speed customers
pjc158

Strange statement from HP regarding Mike Lynch and not capable of scaling a company. Autonomy was a $7bn purchase which started as a small company...

2 days ago by pjc158 on HP cuts 27,000 staff as Autonomy chief Lynch leaves