UPDATED 09:52 EDT / SEPTEMBER 09 2011

Microsoft, Google Cloud Outages Affect Millions

Microsoft’s cloud-based email services experienced a global outage affecting millions of Hotmail, Office365, SkyDrive, and other Microsoft Live-based products earlier today. 
Reports are that most of the services are now restored and coming back online, depending on DNS propagation, which can take as a matter of process hours to take effect.

“Other services, including Windows Live Hotmail, appeared to be coming back online, but online forums and blogs were still reporting issues at 5:00am ET.

While it was unclear what caused the outage, there was speculation that Microsoft was caught in a power blackout that hit large parts of America’s southwest late Thursday.”

Microsoft reports progress in restoring the services on the WindowsTeamBlog: 

We have completed propagating our DNS configuration changes around the world, and have restored service for most customers. Depending on your location you may still experience issues over the next 30 minutes as the changes make their way through the network. Thank you for your patience as we have worked to address these issues.

A detailed root cause has not been reported yet, and theories will abound; it may very well be based on the internet’s DNS address system. It is clear the restoral of services at a minimum is bound by the restraints of DNS limitations. This type of issue will likely rear the classic debate about the reliability of cloud computing and the risk it incurs. Office 365 is Microsoft’s subscription-only cloud-based application suite for the enterprise and this marks the second technical outage in the last month. But Microsoft is not alone; Google Docs also suffered an outage yesterday.  The Google App Status Dashboard reads:

We’re aware of a problem with Google Docs List affecting a majority of users. The affected users are unable to access Google Docs List. We will provide an update by September 7, 2011 4:40:00 PM UTC-6 detailing when we expect to resolve the problem. Please note that this resolution time is an estimate and may change.

A number of recent high-profile cloud service failures have affected confidence in the technology.

GROWING PAINS

This shouldn’t stop anyone from adopting cloud-based services, not in the slightest. The advantage to business and enablement to the user is far too great. What we are seeing are the inevitable growing pains of an industry that is just in its infant stages. Incidents such as these inevitably can be traced to and alleviated with process and communication. In the case of Microsoft, it is quite likely that if this is found to be a DNS cause, then indeed the duration of such an outage is bound by the limitations of DNS and how it propagates.

The lesson learned here is to apply better process and increase the communications with the client base, ensuring as best possible that it will not happen again and here’s the reason why. That level of transparency and communication is critical to cloud operations. No service is impervious to human error, or coding flaw, or any number of situations, but process, infrastructure and response can be constructed around those unforeseen issues to deal with these incidents. You can bet that Microsoft will be working on this as any organization in this realm would. Microsoft is betting big and doubling down on the cloud, based on Microsoft’s own public statements and stories surrounding Windows 8 features and Xbox cloud-based features.

In the meantime, cloud adopters can rest assured that these outages, much like Amazon’s EC3 outage back in April, are not likely to reoccur. The downtime risk profile associated with cloud based services is really not all that different than internal measures, what is different of course is the scale and number of users one incident can affect. Data loss and inability access information can have a massive effect on business health and the bottom line. Service Level Agreements, or SLA’s, only go so far to compensate for financial losses.

For now, a typically sound organization can minimize these risks by distributing resources, analyzing those SLA’s and making sure they align with business interests. Microsoft and other cloud-service providers recognize these issues and the cost they incur and are certainly working to deliver on their uptime commitments.

UPDATE

Still receiving patches of reports of continued intermittent MS service issues from colleagues and other sources at this time.  


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU