UPDATED 07:00 EDT / DECEMBER 19 2014

Microsoft admits “human error” led to last month’s Azure outage

Microsoft has just published a blow-by-blow account of what went wrong during last month’s Azure cloud storage service outage, which caused thousands of websites, including its own Windows Store and MSN.com, to go offline.

The Microsoft Azure service interruption on Nov. 18 resulted in intermittent connectivity issues with the Azure Storage service in multiple regions, and it was all due to a simple, human error.

Jason Zander, Microsoft’s Vice President for Azure, admitted as much in a blog post, saying “there was a gap in the deployment tooling that relied on human decisions and protocol”.

At the time of the outage, Microsoft had said the problem was caused due to “… an issue that resulted in storage blob front ends going into an infinite loop, which had gone undetected during flighting (testing).”

Zander explains further:

“There are two types of Azure Storage deployments: software deployments (i.e. publishing code) and configuration deployments (i.e. change settings). Both software and configuration deployments require multiple stages of validation and are incrementally deployed to the Azure infrastructure in small batches. This progressive deployment approach is called ‘flighting.’”

“When flights are in progress, we closely monitor health checks. As continued usage and testing demonstrates successful results, we will deploy the change to additional slices across the Azure Storage infrastructure.”

Now, Microsoft says its analysis of the outage shows that “faulty flighting” was what caused the problem. It began with a software change Microsoft’s enginers made to improve Azure Storage’s performance by reducing the CPU footprint of the Azure Storage Table Front-Ends.

Unfortunately for Microsoft, someone forgot to follow protocol. “The standard flighting deployment policy of incrementally deploying changes across small slices was not followed,” said Zander. The engineer peforming the upgrade apparently thought that doing so was low risk, because the changes had already been flighted on a portion of the production infrastructure for several weeks.

Alas, that was not the case, “because the configuration switch was incorrectly enabled for Azure Blob storage Front-Ends,” Zander said. As a result, when the change was initiated this exposed a bug that caused the Azure Blob storage Front-Ends to enter an infinite loop, meaning it was unable to service requests.

To prevent this kind of cock-up from happening again, Microsoft has since updated its deployment system to enforce its flighting policies for all standard updates, be it code or configuration.

photo credit: o.tacke via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Microsoft admits “human error” led to last month’s Azure outage

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Microsoft admits “human error” led to last month’s Azure outage

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Cookies