UPDATED 17:37 EST / OCTOBER 28 2025

INFRA

AWS disruption prompts Snowflake to spotlight its cross-cloud recovery feature

When Amazon Web Services Inc. suffered a region-wide outage last week that disrupted services across the U.S. East Coast, Snowflake Inc. saw an opportunity to remind customers that disruptions don’t have to be disasters.

Snowflake said more than 300 critical workloads using its Snowgrid feature were able to maintain operations with only minimal interruption by failing over to alternate cloud regions.

“There were a number of folks in the middle of an outage trying to find out what was happening and how they could recover quickly,” said Christian Kleinerman, Snowflake’s chief product officer. “Customers who chose to leverage our business continuity capabilities continued their operations as if nothing had happened. It was a non-event.”

Introduced in 2022, Snowgrid enables organizations to replicate workloads across regions on the three major public clouds and to shift data processing and client connections to alternate sites during a disruption. Failovers are initiated by individual customers based on predefined scenarios. Kleinerman said Snowflake built its service from the ground up to support transactional consistency and low-latency replication.

Snowgrid has three basic components. It can be configured to replicate data from one regional cloud to another. When a disruption occurs in the primary region, customers can trigger a failover to the designated secondary region to shift processing. Workloads resume where they left off without data loss or duplication. Snowgrid automatically redirects client applications to the secondary region using updated Domain Name System entries behind the scenes, so most users see only a brief blip before operations resume.

‘Frenzy and chaos’

“Anyone who has been doing databases for a while realizes that if your line items don’t match your orders because the point in time didn’t match on both sides, you have frenzy and chaos until you fail over,” Kleinerman said.

Once replication is in place, Snowflake continuously manages the state of data and workloads. If a disruption appears likely to last more than a few minutes, users can trigger failover manually, shifting operations to another region or cloud in less than a minute.

One of those customers was Vermont Information Processing Inc., a software provider for the beverage industry that serves more than 1,200 suppliers and 400 distributors. Director of Applications Chris McGinty said his team noticed problems in AWS’s U.S. East region early on October 20.

“We first heard about it from one of our operations team members, who noticed they couldn’t log into the cloud console,” McGinty said. “Around 4 a.m., we noticed we could no longer access the AWS U.S. East 1 console for internal operations.”

Although Snowflake’s services were initially unaffected, internal monitoring showed signs of degraded performance. “Within a matter of about five minutes, we had all of our workloads running on our secondary U.S. West location,” McGinty said. “Our applications saw no real downtime.”

Trust but verify

McGinty said Snowgrid isn’t a “set it and forget it” proposition but requires forethought. “We were confident that what we had to do was going to work, and it did,” he said. “We did a lot of testing. It’s great to have plans and infrastructure in place, but you have to test.”

Kleinerman said Snowgrid’s client redirection feature is key. By automatically updating DNS entries, customer tools like business intelligence dashboards can reconnect without user intervention. “They’ll see a blip for a minute or so, and then everything continues to work as if nothing had happened,” he said.

Snowgrid is a paid, optional feature that only about one-quarter of Snowflake customers use. Kleinerman said many companies are confused about the effectiveness of AWS availability zones, which are isolated data center locations within a region that can provide a level of fault tolerance. Last week’s disruption demonstrated that, in the event of a failure affecting shared services like identity or DNS, availability zones may not be sufficient protection.

The AWS incident was triggered by a failure in the U.S. East region’s DNS, affecting control plane services and leaving many users without access even if their workloads were technically running in alternate availability zones. “Availability zones don’t mean business continuity,” Kleinerman said.

Vermont Information Processing’s McGinty said his organization has no plans to change its setup in response to the outage. “I think we’ve made the right investment,” he said. “It seems like everything’s in place for us to be able to operate successfully when these types of major incidents happen.”

While Snowflake was clearly eager to promote Snowgrid in the wake of the outage, the broader lesson is less about a vendor than about architecting for resilience, Kleinerman said.

“Outages will happen,” he said. “If you prepare, you’re going to have an uneventful day. If you don’t, you’re going to have a busy Monday.”

Image: SiliconANGLE/Google Whisk

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

AWS disruption prompts Snowflake to spotlight its cross-cloud recovery feature

‘Frenzy and chaos’

Trust but verify

Image: SiliconANGLE/Google Whisk

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Celosphere 2025

Dell AI Data Platform Event 2025

Nvidia GTC Washington, D.C. 2025

The AI Security Summit 2025

Audit & Beyond 2025

AWS disruption prompts Snowflake to spotlight its cross-cloud recovery feature

‘Frenzy and chaos’

Trust but verify

Image: SiliconANGLE/Google Whisk

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Celosphere 2025

Dell AI Data Platform Event 2025

Nvidia GTC Washington, D.C. 2025

The AI Security Summit 2025

Audit & Beyond 2025

Cookies