UPDATED 16:36 EDT / FEBRUARY 28 2017


Nearly 5-hour Amazon Web Services outage highlights dependence on one cloud giant

Updated with Amazon status report and additional analysis.

A data storage service run by Amazon Web Services, the largest cloud computing company, suffered a widespread outage today, making some or all of hundreds of websites unavailable and disrupting services used by millions of people and companies.

AWS, which provides cloud storage, computing and other services to companies looking to avoid the cost and time involved with running their own data centers, reported a number of issues on its service health dashboard, resulting in outages starting at about 12:30 p.m. Eastern today.

Update: AWS said in a message on its status site at 5:08 p.m. Eastern that things were back to normal: “As of 1:49 p.m. Pacific, we are fully recovered for operations for adding new objects in S3, which was our last operation showing a high error rate. The Amazon S3 service is operating normally.” (A version of this story for a time mistakenly referred to an 11-hour outage. It was nearly five hours.)

Update 2, March 2: In a post-mortem post, AWS essentially blamed human error for the outage.

Before the repair, Amazon’s S3 storage service in a “region” or set of data centers in the Eastern United States was seeing “increased error rates.” Network monitoring services ThousandEyes said the problem appeared to be an issue with network “layer 3,” which handles data packet forwarding and routing. The firm said all the packet loss was in the Ashburn, Virginia area. AWS said it’s working on fixing the problem, but it continued well into the afternoon Tuesday.

Among the many sites affected were Netflix, Airbnb, Spotify, Business Insider, Flipboard, Expedia, Quora, Slack, Twilio and Amazon’s Twitch gaming video service. Even AWS’s own service status indicators, which not surprisingly depend on S3 for graphics, showed green because apparently whatever turns them yellow was affected by the outage. Apple Inc. also said it was having issues with its App Store, Apple Music, iCloud services, iTunes and other services, though it wasn’t entirely clear that AWS was the culprit.

It’s a rare outage for a service that has become nearly ubiquitous both for websites and web services and for companies that increasingly are replacing or supplementing their data centers with cloud services. It’s also a potent sign of how much businesses and their customers have come to depend on cloud computing.

AWS has been a cash cow for Amazon.com Inc., a nearly $13 billion annual revenue business that has supplied virtually all of Amazon’s profits in recent quarters. In the fourth quarter, AWS accounted for 8 percent of Amazon’s sales, along with $926 million in operating profits.

Because companies customarily don’t store all their digital assets and applications on one cloud storage services, only parts of the web services were affected in many cases. The problem has manifested itself in issues such as broken links or images that take a long time to load.

Although AWS outages are rare, they’re not unprecedented. In 2015, one outage lasted five hours.

As of a little before 3 p.m. Eastern, Amazon reported some progress. “We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue,” AWS said on its status page.

“Cloud outages like we’re seeing today with Amazon Web Services further underscore the complex and widespread interdependencies of the modern web,” said Scott Klein, founder of StatusPage, a provider of services for companies to create their own status pages that’s owned by the collaboration software firm Atlassian Pty Ltd.

Klein said his own service has been a 40 times increase in users viewing status pages since the AWS outage began. “As we’ve seen before, an incident with one service can have cascading effects on thousands of other services and millions of end users,” he said.

The incident underscores the need for companies to invest more in their system architectures to spread the risk among more cloud services, Dave Bartoletti, a principal analyst at Forrester Research Inc., told SiliconANGLE. “I expect many will re-evaluate their current data storage architecture – can they effectively switch over to another data source if this happens again?”

Amazon’s outage today “demonstrates how running in the cloud does not absolve companies from the need to ensure high availability in their operations,” Michelle McLean, vice president of marketing for ScaleArc, which provides database performance and load balancing software, said in emailed comments to SiliconANGLE. “All the companies whose services are impacted now were vulnerable because all their operations ran out of a single Amazon region.”

McLean said companies should set up their systems to share operations across regions to avoid issues like this, though she acknowledged that that is a challenging job. That’s all the more so because increasingly, many apps and services depend on each other to function. Moreover, it’s often not information technology people who are spinning up new apps and services, so they’re not as knowledgeable about potential problems.

“The move to the cloud has made people assume the cloud is going to protect the data,” said Don Foster, senior director of solutions marketing and technical alliances at Commvault, a provider of data backup, storage and protection. Now, he said, “enterprises will put a lot more rigor into how they set up these services.”

Amazon’s stock didn’t appear to be affected much. Shares fell less than a half-percent on the day, a little less than the overall NASDAQ market, likely because it takes more than a single outage, even one this long, for companies to change their cloud providers.

Photo: Robert Hof

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.