UPDATED 16:45 EDT / JUNE 04 2024

BIG DATA

Databricks acquires Tabular in a big move to close the compatibility gap with Apache Iceberg

Databricks Inc. today agreed to acquire Tabular Technologies Inc., developer of a universal storage platform based on the Apache Iceberg standard.

The move signals stepped-up efforts by Databricks to bridge the compatibility gap between its Delta Lake storage format and Iceberg. Terms weren’t announced, but Databricks Chief Executive Ali Ghodsi (pictured) told CNBC the price tag was more than $1 billion. Snowflake Inc. and Confluent were also reportedly in on the bidding.

Tabular was founded by three former Netflix Inc. employees who co-created Iceberg at that company. In 2020, the project was donated to open source.

Databricks’ Delta Lake storage framework, introduced the same year, is similar to Iceberg in that both are based on Apache Parquet and support atomicity, consistency, isolation and durability transactions, provide scalable metadata handling and unify streaming and batch data processing. Databricks said Delta Lake has more than 500 code contributors and is used by more than 10,000 companies worldwide.

Race for supremacy

Delta Lake and Iceberg have been in a neck-and-neck race for supremacy in the market for data lakes, which are centralized repositories of structured and unstructured data. Dremio Corp.’s 2024 State of the Data Lakehouse report found that 31% of survey respondents use Apache Iceberg compared with 39% using Delta Lake. However, over the next three years, 29% expect to adopt Iceberg compared with 23% for Delta Lake. SNS Insider Pvt Ltd. estimates that the data lake market will grow more than 21% annually, to $57 billion by 2030.

The competition between the two standards has been a problem for both camps. In a blog post announcing the deal, Tabular wrote, “The problem isn’t about determining which standard is better. The problem is that the risk of investing in the wrong format prevents people from choosing at all.”

Databricks has been taking steps toward bridging the gap with Apache Iceberg for the past two years. Version 3.0 of Deta Lake, released a year ago, added Iceberg compatibility. The company’s UniForm universal lakehouse format added Iceberg support last year.

Advancing the lakehouse

Unifying the storage format is considered a critical step in advancing the adoption of data lakehouses, a Databricks-coined term for a hybrid of traditional data warehouses and data lakes. A lakehouse enables ACID transactions on data stored in object storage with high reliability, performance and compatibility with open-source engines such as Apache Spark, Trino and Presto.

The lakehouse concept has caught fire for its flexibility and scale. Research published by Databricks last year found that nearly three-quarters of 600 technology leaders surveyed have adopted a lakehouse architecture and the rest expect to do so within the next three years.

“Databricks and Tabular will work with the open-source community to bring the two formats closer to each other over time, increasing openness and reducing silos and friction for customers,” Ghodsi said in a statement.

In an interview with SiliconANGLE, Databricks co-founder and Chief Technology Officer Matei Zaharia said bridging the two formats will be a multiyear process, but having the creators of both standards working side-by-side is a major step forward. “Our hope is to make these formats converge so we don’t care about format anymore,” he said.

The announcement was timely, coming during Databricks rival Snowflake’s annual Data Cloud Summit user conference. Yesterday at the conference in San Francisco, Snowflake announced Polaris Catalog, an open catalog implementation that supports cross-engine access to Iceberg data and that competes with Databricks’ Unity Catalog.

“This is a frontal attack on SnowflakeDB and how they treat data in Iceberg tables,” said Rob Strechay, managing director and principal analyst at theCUBE Research. Databricks is “continuing to beat the drum of saying that they don’t care where the data lives. It gives them a team to continue building an open-source community, which will be a brand-new challenge for Snowflake.”

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU