Amazon S3 outage rains on cloud-computing parade

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

COMMENT

Amazon.com's Simple Storage Service, or S3, struck a pothole on the road to the glorious cloud-computing future on Sunday when an outage took the storage system offline for several hours.

The outage may not come as a surprise, as the computing industry is making up what is known as 'cloud computing' as it goes along, often with a server and networking architecture that's one part improvisation to two parts proven best practice.

Computing practices tend to adopt one of two stances.One is tight control, higher prices and high reliability. The other is openness, lower cost, but some degree of unreliability. High-end mainframes and Unix servers can handle transaction loads that would crush most machines using Intel or AMD x86 processors, but they cost more and are less adaptable. Most of the cutting-edge, large-scale action in the internet — including various cloud-computing efforts — is happening with the more free-wheeling technology.

One company operating at colossal scale, Google, has concluded it is better to buy cheap x86 servers and write software that automatically paves over hardware failures. The bigger problem comes when a large system composed of many interacting components loses track of its self-conception, and rebooting a single system or swapping out a hard drive isn't sufficient.

Essentially, Amazon had to reboot S3. Here's how the company described its S3 problem in a statement: "As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to.

"We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again. These are sophisticated systems and it generally takes a while to get to root cause in such a situation," Amazon said. "We will be providing our customers with more information when we've fully investigated the incident."

Om Malik, analysts with GigaOM, described cloud computing as frail: "The S3 outage points to a bigger [and a larger] issue: the cloud has many points of failure — routers crashing, cable getting accidentally cut, load balancers getting misconfigured, or simply bad code."

Yet there are three things that shouldn't be overlooked before writing cloud computing off as a failure:

  • First, you should compare the problems of cloud computing to the alternatives, including running computing services in-house. Corporate datacentres also have crashing routers, bad code and misconfigured load balancers.
  • Second, you can expect reliability to increase as the companies providing cloud infrastructure and services figure out explore the terra incognita.
  • Third, don't confuse Web 2.0 with the foundational elements of cloud computing. A website that uses an online application at another site to mash up data from some other sites then present it using a service from yet another site is indeed susceptible to numerous points of failure. But a single-purpose infrastructure such as Amazon S3 is, at least in theory, a more tightly controlled, single-purpose utility that can offer higher reliability.

That's not to excuse Amazon's outage or gloss over the effect it had on business partners reliant on it. But S3 is the sole part of Amazon Web Services that comes with a service level agreement to promise customers reliability.

A little silver lining to this particular cloud problem is that Amazon is setting expectations at the right level. It said in a statement: "Any downtime is unacceptable, and we won't be satisfied until it is perfect."

Talkback

Despite what many pundits have to say, reliability issues will not be the downfall of cloud computing. Using cloud computing does not mean neglecting to architect solutions that meet their business requirements, including reliability requirements.

I wrote more about this idea here:

Cloud Computing and Reliability
<a href="http://faseidl.com/public/item/212584">http://faseidl.com/public/item/212584</a>

faseidl 10 September, 2008 17:35
Reply

This post has been removed by a moderator.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

Roberto_Store

Now On Sale, Unlocked iPhone 4S / Galaxy Note In Factory Box. Roberto-Techie(UK) ”Now on Sales” Smartphone, Android,Tablets,Gadget &...

2 hours ago by Roberto_Store on Samsung Galaxy S III lined up for sale
Paul Smyth

Is this classic FUD? One thing I would definitely have notice is a Mozilla threat to stop supporting GNU/Linux.

4 hours ago by Paul Smyth via Facebook on Firefox rapid release improves Fedora Linux
UnderINK

I agree with the previous commenter wholeheartedly. I couldn't say it better myself. This is very 'Big Brother'. And while I agree with protecting...

8 hours ago by UnderINK on European e-identity plan to be unveiled this month
Simon Bisson and Mary Branscombe

Nice to see that Turing's idea of a general purpose computer doing once-hardware-powered tasks in software is now universal ;-) Mary

13 hours ago by Simon Bisson and Mary Branscombe on Software with everything
Jason Burchell

seriously now. I've only bothered to read a small bit of the comments. do me and the rest of the world a favour. stop saying it does not work or...

17 hours ago by Jason Burchell via Facebook on Music industry negotiating over 24-bit downloads
Philip Charles Cohen

Read about it and weep, John Donahoe ... In addition to Visa’s V.me, there is now MasterCard’s PayPass digital wallet soon to arrive; another...

21 hours ago by Philip Charles Cohen via Facebook on PayPal takes phone-based payments to the high street
apexwm

Leslie Satenstein : Where have you ever seen Mozilla even mention this? Firefox is the most popular browser in the GNU/Linux OS, so I don't see...

22 hours ago by apexwm on Firefox rapid release improves Fedora Linux
songmaster

SHleG: Do you remember building a clockwork scorpion kit (I'm pretty sure I have a photo of it somewhere) — I think it was called something like...

24 hours ago by songmaster on Software with everything
Chris Wortman

Good I love Yahoo! Their search engine is getting better than Google as of late. I find more of what I want on the first page, and usually within...

1 day ago by Chris Wortman via Facebook on Linux Mint 13 ramps up for KDE release
PatrickG

openhgs has made the point for Windows 8 multiple monitors without realising it! With Windows 7 you have to switch the mouse and so your focus...

1 day ago by PatrickG on Windows 8 could speed multi-monitor uptake
Leslie Satenstein

Mozilla has threatened to stop supporting Linux. I guess that UBUNTU is going with another browser. I indicated that if Mozilla stops supporting...

1 day ago by Leslie Satenstein via Facebook on Firefox rapid release improves Fedora Linux
Andy Bolstridge

Much as I abhor Microsoft's licensing practices, this is almost certainly down to purchasing IT equipment via 3rd party consultants - you get the...

1 day ago by Andy Bolstridge via Facebook on 6 million wasted licences and £1,200 PCs: welcome to government IT
Jack Schofield

@openhgs Windows users have had multiple desktops since Linus started writing Linux. They just haven't shipped as standard because not enough...

2 days ago by Jack Schofield on Windows 8 could speed multi-monitor uptake
Jack Schofield

@Phil at Cloud4 What, Microsoft gets £1,200 per PC and £1,622 per server? Gosh, I'm amazed....

2 days ago by Jack Schofield on 6 million wasted licences and £1,200 PCs: welcome to government IT
craigsc

You guys have no idea what is going on at Autonomy. Autonomy could have been a much more profitable organization. The sales operations at Autonomy...

2 days ago by craigsc on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Moley

How does this impact on dual or multi booting? Seems to me to more or less prohibit this, from Windows 8 anyway. Will Grub 2 recognise Windows 8,...

2 days ago by Moley on Windows 8 start-up speed forces USB boot workaround
apexwm

I don't understand why there cannot be a slight pause during the boot process so the user can press a key. Many operating systems do this, even if...

2 days ago by apexwm on Windows 8 start-up speed forces USB boot workaround
Gavin Goodman

You can now buy the Xi3 modular computer in the UK at http://www.ocdistribution.com . This can be bought with the Tand3m software, pricing and...

2 days ago by Gavin Goodman on CES 2012: Xi3 microSERV3R
Phil at Cloud4

I agree: Mike Lynch can clearly build a business and manage strategy. I suspect the exit of Mike is more likely the end of a planned handover...

2 days ago by Phil at Cloud4 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Phil at Cloud4

This is unbeleivable government wastage with only one winner... Microsoft 1 - Tax payer Nil!

2 days ago by Phil at Cloud4 on 6 million wasted licences and £1,200 PCs: welcome to government IT