UPDATED 20:06 EST / JULY 15 2019

CLOUD

Microsoft reveals how it’s planning to make its Azure cloud even more reliable

Microsoft Corp. says the current 99.995% average uptime of its Azure public cloud infrastructure offering simply isn’t good enough, so it’s taking steps to improve it even more.

In a blog post today, Chief Technology Officer Mark Russinovich noted how Azure’s availability was hurt by “three unique and significant incidents” in the last 12 months.

Those included a major data center outage in its South Central U.S. region back in September and Azure Active Directory Multi-Factor Authentication problems in November. Then there were domain name system maintenance issues in May, which led to further outages for some customers.

Those and other incidents simply won’t do, Russinovich said. In response, the company has created what it calls a “Quality Engineering” team reporting to him that will work alongside its existing Site Reliability Engineering team to come up with ways to beef up Azure’s durability.

The team has already began a number of initiatives to ensure the resiliency of Azure. For example, the company is planning by 2021 to add new availability zones to the 10 largest Azure regions that don’t currently have them. The biggest 10 Azure regions already have availability zones, which help guard against data center-level failures, Russinovich said. Each zone is located within an Azure region and has its own independent power source, network and cooling infrastructure.

The company is also expanding its safe deployment practice framework, which ensures that all code and configuration changes in Azure must pass a set of stringent tests before rolling out to different regions. The framework will be expanded to include all software-defined infrastructure changes in Azure, including alterations to its networking and DNS infrastructure.

Microsoft is also launching in preview the ability for customers to initiate their own failovers at the storage level, as a direct result of the September 2018 data center outage in the South Central U.S. region. Failover refers to a method used to protect computer systems from failure, in which standby equipment automatically takes over when the main system fails.

“Because it is our policy to prioritize data retention over time-to-restore, we chose to endure a longer outage to ensure that we could restore all customer data successfully,” Russinovich said. “A number of you have told us that you want more flexibility to make this decision for your own organizations, so we are empowering customers by previewing the ability to initiate your own failover at the storage-account level.”

The CTO also discussed Microsoft’s Project Tardigrade, which is an upcoming service intended to detect hardware failures and memory leaks before they happen and freeze affected virtual machines so they can be moved to a different host.

“Continuous, real-time improvement is one of the great advantages of cloud services, and while we will never eliminate all such risks, we are deeply focused on reducing both the frequency and the impact of service issues while being transparent with our customers, partners, and the broader industry,” Russinovich said.

Constellation Research Inc. analyst Holger Mueller said it was good to see Microsoft adding more processes and best practices to make Azure more resilient, as reliability is one of the most important value propositions for cloud computing.

“The most important update is the expansion of its availability zones, as this is one area where Microsoft actually trails other cloud providers,” Mueller said.

Image: bsdrouin/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU