UPDATED 18:50 EDT / DECEMBER 23 2020

CLOUD

Google blames last week’s outage on Google User ID Service error

Google LLC said today that a simple “zero” error was responsible for taking its global authentication system offline and preventing users from accessing Gmail, YouTube and its cloud services for more than an hour last week.

The company said one day after the Dec. 14 outage that its preliminary analysis had found that the cause of the incident was an issue with its automated storage quota management system. That, Google said, caused a reduction in the capacity of its central identity management system, thereby blocking people from accessing services that require them to log in.

The outage only lasted for about an hour, but it was noticed by millions of people around the world. It also affected thousands of companies that rely on Google Cloud Platform for computing resources. That’s bad for business, of course, since the reliability and availability of cloud services are among the most important considerations for any enterprise.

Google’s full incident report provided Tuesday shows the problem was caused by what it calls a “zero” error generated by a legacy storage quota system it uses to provision storage automatically for its authentication system.

“As part of an ongoing migration of the User ID Service to a new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system were left in place which incorrectly reported the usage for the User ID Service as 0,” the report said. “As a result, the quota for the account database was reduced, which prevented the Paxos leader from writing. Shortly after, the majority of read operations became outdated which resulted in errors on authentication lookups.”

The Google User ID Service has a unique identifier for each Google account. It handles authentication credentials for the OAuth tokens and cookies that are used to log people in to a service without entering their password each time. This data is stored on a distributed cloud database that uses the Paxos protocol to coordinate updates once it decides which data values it needs to process.

“For security reasons, this service will reject requests when it detects outdated data,” Google said. “An existing grace period on enforcing quota restrictions delayed the impact, which eventually expired, triggering automated quota systems to decrease the quota allowed for the User ID service and triggering this incident. Existing safety checks exist to prevent many unintended quota changes, but at the time they did not cover the scenario of zero reported load for a single service.”

Google’s report also covered the impact of the outage on its Google Cloud Storage, Google Cloud Network, Google Kubernetes Engine, Google Workspace (formerly G Suite), and Google cloud support services. It said that “all authenticated Google Workspace apps were down for the duration of the incident.” In addition, about 4% of requests to the GKE control plane API failed, and nearly all customer and Google-managed workloads were unable to report metrics to Cloud Monitoring.

Google’s report concluded that the majority of its authenticated services across Google Cloud and Google Workspace saw “elevated error rate,” and that all of its services that require users to log in with a Google Account were “affected with varying impact.”

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU