UPDATED 07:15 EDT / MARCH 09 2015

Google Open Sources Their HTTP2 Spec NEWS

Google suffers new cloud outage, promises to be “better prepared”

3599537952_2d1e41cbeeGoogle’s cloud platform suffered another outage over the weekend in an incident that seems similar to a more serious brownout that took place last month.

Google said the latest outage affecting its Compute Engine, which was first reported by The Register, came about due to a “packet loss on agress network traffic”, which caused an array of symptoms ranging from “unusually slow responses, to timeouts attempting to contact the VM.” The incident apparently lasted for just 43 minutes, during which time VMs stayed online, and after which things went back to normal.

The outage was somewhat similar to a much more widely reported incident in February, when “The internal software system which programs GCE’s virtual network for VM egress traffic stopped issuing updated routing information,” leading to downtime across multiple zones for about one hour.

Explaining the latest incident, Google put out the following statement: “The root cause of the packet loss was a configuration change introduced to the network stack designed to provide greater isolation between VMs and projects by capping the traffic volume allowed by an individual VM. The configuration change had been tested prior to deployment to production without incident. However as it was introduced into the production environment it affected some VMs in an unexpected manner.”

Google pointed out that this weekend’s outage was not nearly as severe as the one seen in February, and said it was now busy investigating why prior testing of the configuration change failed to pick up any potential problems with the service.

The good news is that Google is planning to tighten things up to prevent any interruptions like this from happening again, saying that future changes won’t be introduced until its test suite has been improved to the point where it demonstrates parity with behavior seen in actual production.

“Google engineers are immediately amending the rollout protocol for network configuration changes so that future production changes will be applied to a small fraction of VMs at a time, reducing the exposure in the event of undetected behavior,” the company said.

Google’s embarassment follows a similar serious outage over at Microsoft’s Azure cloud in November 2014, which affected the availability of its Office 365 and OneDrive cloud storage services.

Both Google and Microsoft are generally viewed as leading players in the world of hyperscale cloud computing, but as these incidents continue to demonstrate, the cloud is still an evolving technology and not one major provider can yet claim 100 percent uptime.

Photo Credit: ThoseBlastedApes via Compfight cc


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU