UPDATED 17:39 EDT / NOVEMBER 15 2023

BIG DATA

Onehouse open-sources its OneTable data tool with support from Google and Microsoft

Onehouse Inc. today open-sourced OneTable, a software tool designed to help companies more easily manage data stored in different formats.

Menlo Park, California-based Onehouse is backed by $33 million in funding from Greylock Partners and other investors. It provides a data lakehouse platform that companies can use to power their analytics and artificial intelligence initiatives. The platform is based on Apache Hudi, a popular data format that Onehouse Chief Executive Officer Vinoth Chandar created before launching the company in 2021.

OneTable, the technology Onehouse open-sourced today, was originally introduced in February as a feature addition to its data lakehouse. The open-source version is the fruit of a monthslong collaboration with Microsoft Corp. and Google LLC. In the next phase of their partnership, the three companies plan to donate the project to the Apache Software Foundation.

Hudi, the technology underpinning Onehouse’s platform, is a data format in which companies can keep the information they process during analytics projects. The format is available under an open-source license. Hudi competes with two other open-source data formats, Delta Lake and Apache Iceberg, that focus on similar use cases but diverge significantly in the feature department.

Onehouse’s newly open-source OneTable tool can convert data between the three formats. Companies already have access to a software product, Databricks Inc.’s Delta UniForm offering, that can turn Delta Lake data into Hudi and Iceberg files. But it only performs the task in one direction, whereas OneTable supports bidirectional data conversions between all three formats.

OneTable takes advantage of the fact that Delta Lake, Hudi and Iceberg are based on a fourth open-source data format called Apache Parquet. The former three technologies all represent information as Parquet files. They bundle each such file with metadata that describes its structure, past edits and other key details.

When converting a dataset between Delta Lake, Hudi and Iceberg, OneTable doesn’t have to reformat the Parquet files that make up the dataset because Parquet is supported by all three technologies. Onehouse only modifies the metadata that the three formats attach to those files. As a result, OneTable can carry out data conversions using a relatively limited amount of hardware resources.

Some analytics tools have features that only work with one specific data format, or support that format better than others. Onehouse says that OneTable can make such features more accessible for companies with datasets in multiple formats. Moreover, the tool can help companies avoid becoming locked into one format. 

“Some customers want their data available in both Databricks Delta and Snowflake’s private preview Iceberg tables,” Onehouse head of product Kyle Weller and data engineer Tim Brown detailed in a blog post. “Some users need fast ingestion and incremental processing of Hudi, but they also want to take advantage of some of the special caching layers inside BigQuery’s support of Iceberg tables. Some users only need one format, but they want the assurance of being future proof, and Onehouse gives them all 3 simultaneously.”

Going forward, Onehouse, Microsoft and Google will enhance the tool in several ways. The companies plan to release more integrations with external data management systems. They also intend to improve OneTable’s performance and efficiency, as well as add compatibility with more data formats besides the three currently supported. 

Image: Onehouse

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.