Amazon blames outage on complicated systems

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

NEWS

Amazon.com appears to be blaming its complicated infrastructure for the outage that left it inaccessible to many US visitors for more than an hour and a half on Friday.

Amazon declared itself clear of the problem on Friday afternoon. "The Amazon retail site was down for approximately two hours earlier today beginning around 10.25am. The site [is] back up," the company said in statement following the outage. "Amazon's systems are very complex and, on rare occasions, despite our best efforts, they may experience problems. We work to minimise any disruption and to get the site back as quickly as possible." Amazon declined to comment further.

The site, which is held up as an exponent of cloud computing due to the large number and complexity of web services used by partner sites, went offline completely by 10.21am. PDT on Friday. Efforts to restore it appeared to be taking effect about noon, said Keynote Systems, which monitors website responsiveness. As of 12.45pm, the site was working intermittently, with many product pages functioning but others still broken.

"At noon PDT, we started to see the site getting better," said Shawn White, director of external operations for Keynote. "We [were] seeing about 70 percent availability."

Sustained outages can be a serious problem. EBay suffered outages in 1999 that outraged users and sent the stock down; even a backup system didn't ward off more problems in 2002.

For major commerce sites, the problem can have a ripple effect. Both Amazon and eBay provide a commercial foundation used by many partners and entrepreneurs.

Based on last quarter's revenue of $4.13bn (£2.09bn) globally, a full-scale global outage would cost Amazon more than $31,000 per minute, on average. For North America, it would be more than $16,000 per minute. Those figures do not include revenue from other sources, such as search or contextual advertisements or Amazon Web Services.

It appeared that Amazon Web Services such as the S3 storage and EC2 computing services continued to function at least for some customers, though the Amazon Web Services page at Amazon.com wasn't working.

"S3 and EC2 continue to function for us as normal," said Don MacAskill, chief executive of photo-sharing site Smugmug. Mashery.com chief executive Oren Michels, who uses Amazon Web Services for several functions and who has several customers who use Amazon Web Services, reported no problems on Friday.

As to the explanation for the outage, the company only hinted its complicated computing infrastructure was the culprit.

In the estimation of Shawn White, director of operations for Keynote, the most likely culprit was simple human error.

"Some engineer might have made a particular change, not knowing it could cause a trickle-down effect [that eventually brought down the site]," said White.

For example, he said, somebody in charge of maintenance might have been directing internet traffic to a particular group of servers, but selected the wrong group.

"What I find still so surprising is that it happened in the middle of the day. Typically, you do that in off-peak hours," White said. "[Amazon] ranks on the top with performance and availability, consistently, time and time again."

Another possible explanation is an attack such as the distributed denial-of-service (DDoS) attack that struck Amazon and other high-profile sites in 2000. White said he thinks it unlikely, though, that a crushing load of network traffic brought Amazon down.

"These guys are experts at dealing with flash floods of users", including those that routinely arrive during peak shopping days, said White. "Usually, when you see a site going under because of traffic issues or a denial-of-service attack, you see a gradual slowdown in performance and drop in availability. Here, we saw at 10.16am that it completely dropped off — 100 percent."

Soups Ranjan, a senior member of the technical staff of network protection and management company Narus, hasn't yet found any attack evidence.

"It doesn't seem to be the result of a network-initiated attack, at least from my preliminary analysis from our probes," Ranjan said.

Human error may not sound as gripping a tale as a network attack, but there's plenty of drama for the people responsible. And it's the career-limiting variety of drama, said Illuminata analyst Gordon Haff, who hazarded a guess that Amazon's problem involved its front-end web servers.

The security group of WebSense, a website and communications protection company, also saw no evidence that Amazon's problem was security related.

CNET News.com's Robert Vamosi contributed to this report.

Talkback

Hello,
I am a student in grenoble school of business, and I'm doing a research about "business dependence on the Internet". Can you help me by answering to 3 rapid questions, then by sending this message to your competent colleagues or friends.
The questionnaire is here: http://pasczoon.free.fr/blogen.html

Thank you in advance.
Pascal, email: Internet.Dependence.Study –at- gmail.com

PascZoon 9 June, 2008 19:52
Reply

Technology has changed our lives. Now its high time that we should update our systems to be always connected.

Richards

Richards 6 August, 2008 16:44
Reply

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

Jack Schofield

@openhgs Windows users have had multiple desktops since Linus started writing Linux. They just haven't shipped as standard because not enough...

38 minutes ago by Jack Schofield on Windows 8 could speed multi-monitor uptake
Jack Schofield

@Phil at Cloud4 What, Microsoft gets £1,200 per PC and £1,622 per server? Gosh, I'm amazed....

49 minutes ago by Jack Schofield on 6 million wasted licences and £1,200 PCs: welcome to government IT
craigsc

You guys have no idea what is going on at Autonomy. Autonomy could have been a much more profitable organization. The sales operations at Autonomy...

2 hours ago by craigsc on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Moley

How does this impact on dual or multi booting? Seems to me to more or less prohibit this, from Windows 8 anyway. Will Grub 2 recognise Windows 8,...

3 hours ago by Moley on Windows 8 start-up speed forces USB boot workaround
apexwm

I don't understand why there cannot be a slight pause during the boot process so the user can press a key. Many operating systems do this, even if...

4 hours ago by apexwm on Windows 8 start-up speed forces USB boot workaround
Gavin Goodman

You can now buy the Xi3 modular computer in the UK at http://www.ocdistribution.com . This can be bought with the Tand3m software, pricing and...

4 hours ago by Gavin Goodman on CES 2012: Xi3 microSERV3R
Phil at Cloud4

I agree: Mike Lynch can clearly build a business and manage strategy. I suspect the exit of Mike is more likely the end of a planned handover...

8 hours ago by Phil at Cloud4 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Phil at Cloud4

This is unbeleivable government wastage with only one winner... Microsoft 1 - Tax payer Nil!

8 hours ago by Phil at Cloud4 on 6 million wasted licences and £1,200 PCs: welcome to government IT
Mispam

So what do you do when you can't boot into windows? Why can't I just hold Shift while I power up instead of having to boot into windows and click a...

8 hours ago by Mispam on Windows 8 start-up speed forces USB boot workaround
apexwm

I've also seen that Mac OS X for Intel machines is supposed to run in VirtualBox, which would also be a nice solution. I've never tried it though.

10 hours ago by apexwm on xTreme Triple Booting: Linux, Mac & Windows
dave heasman

What I wonder is why when companies are caught bang to rights in not providing contracted services, people bend over to smear the customers? Surely...

11 hours ago by dave heasman on Virgin throttles broadband for high-speed customers
pjc158

Strange statement from HP regarding Mike Lynch and not capable of scaling a company. Autonomy was a $7bn purchase which started as a small company...

11 hours ago by pjc158 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
lojolondon

Or - possibly, they will destroy business by ensuring people do not invest where there is no return. Another socialist idea, well beyond it's...

14 hours ago by lojolondon on Open Data Institute will act as biz incubator
J.A. Watson

Good stuff Jake, very interesting. Thanks. jw

15 hours ago by J.A. Watson on xTreme Triple Booting: Linux, Mac & Windows
openhgs

"the cost of a second LCD screen is about the same as one day of an office worker's time, so this should soon be recouped in extra productivity."...

16 hours ago by openhgs on Windows 8 could speed multi-monitor uptake
Thomas Gellhaus

I also installed the KDE version; I also will probably try out razorqt since I really haven't had a chance to before. I'm looking forward to the...

1 day ago by Thomas Gellhaus via Facebook on Mageia 2 Released
francisabigail

Acquiring when reinvention/cannibalization is too challenging for a large organization can be an excellent strategy- still, so many mergers stumble...

1 day ago by francisabigail on Ariba buy parks SAP on Oracle's cloud turf
apexwm

All of the feedback regarding using a touch monitor for a desktop PC is right on. Several months ago, we installed a "demo" multitouch all-in-one...

1 day ago by apexwm on Windows 8 could speed multi-monitor uptake
191706

anyone wanting to triple boot *their* own Mac

1 day ago by 191706 on xTreme Triple Booting: Linux, Mac & Windows
SoapyTablet

Cont.. Biggest Bugbear: Win7's stop-animate-go approach to work, you develop a staggered (not in the above alchohol sense of the word) approach to...

1 day ago by SoapyTablet on Windows 8 could speed multi-monitor uptake