UPDATED 16:17 EDT / DECEMBER 07 2021

CLOUD

AWS outage causes disruptions across popular online services

Updated:

An outage in one of Amazon Web Services Inc.’s data center clusters is causing disruptions across a large number of popular online services, including financial apps, food delivery platforms and others.

The outage is the result of a malfunction in some of the network devices that AWS uses to power its public cloud. Because AWS is the industry’s leading cloud provider, the outage has caused disruptions across a large number of online services that rely on the Amazon.com Inc. unit’s platform to support their business operations. Netflix, Robinhood, DoorDash, Spotify and Coinbase are a few of the services said to be affected. 

At 9:37 a.m. PST, AWS engineers posted a memo on the cloud giant’s Service Health Dashboard web page in which they reported “impact to multiple AWS APIs in the US-EAST-1 Region.” The US-EAST-1 Region is one of the several data center clusters on which the cloud platform runs. The initial memo provided little information about what led to the malfunction, specifying only that AWS has identified the root cause and is working to recover affected services.

“This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates,” the company added.

About a half-hour after the initial memo, at 10:12 a.m. PST, AWS engineers posted an update in which they stated that some affected services have begun returning to normal operations. The cloud giant was “starting to see some signs of recovery. We do not have an ETA for full recovery at this time,” the update specified.

AWS provided more details about the root cause and scope of the outage in a third update posted at 11:26 a.m. PST. The cloud giant disclosed that the malfunction is the result of issues with “several” network devices in the US-EAST-1 Region data center cluster. “We are pursuing multiple mitigation paths in parallel, and have seen some signs of recovery, but we do not have an ETA for full recovery at this time,” the company added.

As for the scope of the incident, AWS said that the issue appeared to affect more than a half-dozen services hosted in the US-EAST-1 Region. The services listed by the cloud giant included EC2, Connect, DynamoDB, Glue, Athena and Chime, as well as others. 

The outage is also impacting customers’ ability to log into the AWS Management Console, a dashboard used to manage cloud environments. However, “customers can login to consoles other than US-EAST-1 by using an IAM role for authentication,” AWS stated in the update posted at 10:12 a.m.

AWS parent Amazon.com is reportedly also experiencing some technical issues because of the outage.

According to The Verge, there were reports of consumers encouraging errors when trying to access Alexa, Kindle ebooks and certain smart home products from the company. Some Amazon warehouse and delivery workers are reportedly unable to access two internal applications. Additionally, multiple third-party merchants who sell merchandise via Amazon’s e-commerce marketplace indicated that they can’t log into Seller Central, an internal website for managing customer orders.

Amazon posted its most recent update about the outage at 12:34 p.m. PST. “We continue to experience increased API error rates for multiple AWS Services in the US-EAST-1 Region. The root cause of this issue is an impairment of several network devices. We continue to work toward mitigation, and are actively working on a number of different mitigation and resolution actions. While we have observed some early signs of recovery, we do not have an ETA for full recovery,” the company stated.

“We will continue to provide updates here as we have more information to share,” AWS added, referring to its Service Health Dashboard page.

Update: AWS made gradual progress through the afternoon.

At 2:43 p.m., it said it had “mitigated the underlying issue that caused some network devices in the US-EAST-1 Region to be impaired. We are seeing improvement in availability across most AWS services. All services are now independently working through service-by-service recovery. We continue to work toward full recovery for all impacted AWS Services and API operations.”

AWS added at 3:30 p.m. that “many services have already recovered, however we are working towards full recovery across services. Services like SSO, Connect, API Gateway, ECS/Fargate, and EventBridge are still experiencing impact. Engineers are actively working on resolving impact to these services.” And at 4:35 p.m., it said that “with the network device issues resolved, we are now working towards recovery of any impaired services.”

As of Wednesday morning, Dec. 8, AWS said the issues were resolved.

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU