UPDATED 19:03 EDT / FEBRUARY 24 2013

What Happened to Microsoft Azure on Friday; Death by Security Certificate

An expired SSL (secure sockets layer) certificate just became the lethal bugaboo for Microsoft’s Azure Cloud Storage services when the expiration pulled down the cloud service on Friday afternoon. It took the software mega-giant less than a day to fix the problem and Microsoft announced on Saturday that the service had been entirely restored.

“Beginning Friday, February 22 at 12:44 PM PST, Storage experienced a worldwide outage impacting HTTPS operations (SSL traffic) due to an expired certificate,” Microsoft revealed on the Windows Azure service dashboard during the outage.

Not only did the outage affect much of the Azure service, but customers also reported issues struck Xbox Music and Video services—potential issues also reported by the company while the service was being restored.

“Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers,” the company said.

Repairs completed, but what led to a single expired certificate to cause a global outage?

This isn’t the first time that Microsoft Azure has suffered an outage that affected a multitude of their customers. Last year, also in February, Windows Azure Management Service went down and the problem spread to Windows Azure Compute. That outage was also caused by a certificate problem due to a date-related issue triggered by Leap Day.

The company, since contacted, has not revealed the source of how a certificate was permitted to expire or how the expiration itself led to the outage.

Microsoft isn’t the only cloud provider in the ecology to suffer from outages that lasted hours or even portions of days. Amazon Web Services has suffered several massive outages (July 2012, August 2012) as well has Google (September 2012, December 2012). These outages and the usefulness of cloud architectures that depend on systems such as AWS and Azure lead to questions of how to handle or crisis manage when the primary provider has a massive failure.

Companies such as Reddit and Netflix rely heavily on AWS. As a result, Netflix has been working on open source libraries for crisis recovery should their cloud-based infrastructure through AWS gets borked—but so far nobody has been able to survive massive outages without a scratch; although Netflix has been working on solutions to make sure customers still get served even if the cloud isn’t serving them.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU