UPDATED 17:39 EDT / NOVEMBER 15 2023

BIG DATA

Onehouse open-sources its OneTable data tool with support from Google and Microsoft

Onehouse Inc. today open-sourced OneTable, a software tool designed to help companies more easily manage data stored in different formats.

Menlo Park, California-based Onehouse is backed by $33 million in funding from Greylock Partners and other investors. It provides a data lakehouse platform that companies can use to power their analytics and artificial intelligence initiatives. The platform is based on Apache Hudi, a popular data format that Onehouse Chief Executive Officer Vinoth Chandar created before launching the company in 2021.

OneTable, the technology Onehouse open-sourced today, was originally introduced in February as a feature addition to its data lakehouse. The open-source version is the fruit of a monthslong collaboration with Microsoft Corp. and Google LLC. In the next phase of their partnership, the three companies plan to donate the project to the Apache Software Foundation.

Hudi, the technology underpinning Onehouse’s platform, is a data format in which companies can keep the information they process during analytics projects. The format is available under an open-source license. Hudi competes with two other open-source data formats, Delta Lake and Apache Iceberg, that focus on similar use cases but diverge significantly in the feature department.

Onehouse’s newly open-source OneTable tool can convert data between the three formats. Companies already have access to a software product, Databricks Inc.’s Delta UniForm offering, that can turn Delta Lake data into Hudi and Iceberg files. But it only performs the task in one direction, whereas OneTable supports bidirectional data conversions between all three formats.

OneTable takes advantage of the fact that Delta Lake, Hudi and Iceberg are based on a fourth open-source data format called Apache Parquet. The former three technologies all represent information as Parquet files. They bundle each such file with metadata that describes its structure, past edits and other key details.

When converting a dataset between Delta Lake, Hudi and Iceberg, OneTable doesn’t have to reformat the Parquet files that make up the dataset because Parquet is supported by all three technologies. Onehouse only modifies the metadata that the three formats attach to those files. As a result, OneTable can carry out data conversions using a relatively limited amount of hardware resources.

Some analytics tools have features that only work with one specific data format, or support that format better than others. Onehouse says that OneTable can make such features more accessible for companies with datasets in multiple formats. Moreover, the tool can help companies avoid becoming locked into one format. 

“Some customers want their data available in both Databricks Delta and Snowflake’s private preview Iceberg tables,” Onehouse head of product Kyle Weller and data engineer Tim Brown detailed in a blog post. “Some users need fast ingestion and incremental processing of Hudi, but they also want to take advantage of some of the special caching layers inside BigQuery’s support of Iceberg tables. Some users only need one format, but they want the assurance of being future proof, and Onehouse gives them all 3 simultaneously.”

Going forward, Onehouse, Microsoft and Google will enhance the tool in several ways. The companies plan to release more integrations with external data management systems. They also intend to improve OneTable’s performance and efficiency, as well as add compatibility with more data formats besides the three currently supported. 

Image: Onehouse

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU