UPDATED 20:51 EDT / AUGUST 08 2022


Complicated rivalry between Snowflake and Databricks spotlights key trends in enterprise computing

There have been legendary rivalries among tech companies over decades: Amazon versus Microsoft and Google in cloud, Salesforce versus Oracle and SAP, Twitch versus YouTube, Tesla versus the tech divisions of an entire automotive industry.

Now there’s a growing contest between enterprise heavyweights Snowflake Inc. and Databricks Inc. that promises to deliver its own fireworks. The companies are circling each other like prizefighters in a boxing ring over key technology areas such as open source, information technology infrastructure and artificial intelligence.

But like much in the technology world, this is a complicated saga and nothing is exactly as it seems.

“Snowflake started as an infrastructure play, and it is trying to move into the analytics processing space,” Yanev Suissa, founder and managing partner at SineWave Ventures, an early investor in Databricks, said in an interview with SiliconANGLE. “Databricks is seeking to do the reverse. They are clearly trying to get into each other’s swim lanes.”

The attention surrounding this burgeoning competition between the two data storage and analytics firms highlights the growing influence of superclouds, exemplified by companies like Snowflake and Databricks that provide an abstraction layer above and across hyperscale infrastructure. It is a rivalry that foreshadows a new wave of technology influencers as the two firms compete for control of the enterprise’s most important asset – data itself.

Data Cloud versus Lakehouse

Both companies have carved out a sizable market presence in cloud data warehousing, while going to great lengths to avoid using the label. Snowflake prefers to be known for its “Data Cloud,” and Databricks characterizes its melding of data warehouses and data lakes as the “lakehouse.”

Snowflake built its reputation as a persistent data platform, with data sharing and transformation, while Databricks has evolved as more of an analytical workbench. Databricks was built on Apache Spark, an open-source analytics engine for big data and machine learning that was developed in 2009 at the University of California at Berkeley.

“Databricks came in as a steward of Spark, and their approach is from a data science/data engineering world,” said Dave Vellante, industry analyst for SiliconANGLE. “Snowflake is trying to build a de facto standard. It’s like the Apple Mac mindset versus Windows. It’s a proprietary system that runs in the cloud.”

That proprietary emphasis has emerged as a point of friction. The two companies have traded barbs in recent weeks over open source, with Snowflake pointing to its Apache Iceberg offering as a viable open-source table format. Databricks responded by noting that its Delta Lake solution was posted on GitHub as an open-source project with more than 200 contributors from 70 organizations.

Why is this important? Both companies operate in a high-stakes, rapidly evolving environment where innovation rules. As the open-source movement has shown, innovation takes a village, and being able to claim a diverse ecosystem of contributing developers can provide significant competitive advantage.

“There’s a lot of innovation around open source, you need a hand in open source to stay on that innovation curve,” Vellante noted. “Databricks is trying to challenge the conventional notion that open source can’t compete functionally with a proprietary system. They are doing a good job in that regard, but history suggests de facto standards will get to market sooner.”

Avoiding a benchmark war

To bolster its position against Snowflake, Databricks published TPC-DS benchmark data in November showing that its SQL platform outperformed its competitor. In a lengthy blog post posted soon after, Snowflake refuted the results.

When SiliconANGLE asked Snowflake co-founder and president of products Benoit Dageville about the Databricks comparison, he made clear that his company would not be dragged into a contest over competitive performance claims based on benchmarks.

“We’ve said from day one, we would never again participate in this really stupid benchmark war because it’s not in the interest of customers,” Dageville said. “TPC was really important at some point, and it is not really relevant now.”

Yet the comparisons provided by Databricks caught the attention of several noted industry analysts, including Sanjeev Mohan, principal at SanjMo and former Gartner Research vice president.

“The story that Databricks tells is actually very compelling in their benchmarks,” said Mohan, in an interview for this story. “It’s very hard to say which one has the better technology. The end goal is the same for both, but they come at it from two different angles.”

Major alliances

The different approaches by the two firms have also led to a noticeable split among some of the biggest players in the tech industry. Amazon Web Services Inc. employed Iceberg to develop the serverless interactive query service Amazon Athena. Google Cloud also chose to support Iceberg first for its own cloud lakehouse offering called BigLake.

On the Databricks side, Microsoft Corp. has supported Delta Lake and has joined with Apple Inc. and IBM Corp. as  collaborators. According to a spokesperson from Snowflake, Apple is also top contributor to Iceberg.

However, tempting as it may be to paint the Snowflake/Databricks rivalry as a showdown between even larger tech powerhouses, the reality is that both companies are closely tied to the major cloud providers and this interdependence is unlikely to change in the near future.

“I think that’s more drama than reality,” said Suissa, who noted that both Databricks and Snowflake were built on top of the major clouds. Hyperscalers stand to gain significant revenue from this arrangement as the two competitors become more successful.

Despite the competition, both firms have found time to engage in mutual funding projects. Snowflake and Databricks jointly participated in the latest funding round for dbt Labs Inc., developer of a data transformation tool. The two companies have also recently backed data science startup Hex Technologies Inc.

Still, the battle appears unlikely to ease anytime soon as each company forges ahead on its respective path.

“Both companies are addressing their weaknesses and doubling down on their strengths,” Vellante said. “What Databricks did was address the functionality in data lakes to fix the data swamp. Snowflake is building an ecosystem and enabling that ecosystem to monetize inside their Data Cloud. They are building an AWS-like supercloud.”

Image: Pixabay Commons

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy