BIG DATA
BIG DATA
BIG DATA
Onehouse, a new data lakehouse startup, today launched from stealth mode after raising a $8 million seed funding round co-led by Greylock and Addition.
A lakehouse is a new type of software solution that helps enterprises more efficiently extract insights from their data. It combines the features of data warehouses and data lakes in a single platform.
Menlo Park, California-based Onehouse provides a managed, cloud-based lakehouse service designed for ease of use. Setting up a lakehouse environment usually requires months of work and specialized technical know-how. Onehouse says that its service reduces the setup process from months to just a few minutes.
Onehouse’s service is based on the open-source Apache Hudi platform, which was created by founder and Chief Executive Officer Vinoth Chandar while working at Uber Technologies Inc. as a data architect. Uber uses the platform in production to process about 500 billion records every day. Other users include Amazon.com Inc., Walmart Inc. and General Electric Co.’s GE Aviation division.
“The data lake house is the future of data lakes, providing customers the ease of use of a data warehouse with the cost and scale advantages of a data lake,” said Greylock Partner Jerry Chen. “Apache Hudi is already the de facto starting point for modern data lakes and today Onehouse makes data lakes easily accessible and usable by all customers.”
One of the flagship features of Onehouse’s lakehouse service is a technology called incremental processing. It allows companies to start analyzing their data soon after it’s generated, which is difficult when using traditional technologies.
Before a company can start analyzing its business records for insights, it has to move the records to a data processing environment. This task is usually performed with ETL, or extract, transform and load, software.
Moving records to a data processing environment using ETL software often takes hours, which means that by the time the information arrives in a company’s data processing environment, it’s no longer fresh. As a result, the information becomes less useful. That’s an especially major challenge when it comes to implementing real-time analytics use cases, which depend on the ability to process data soon after it’s generated.
Onehouse’s incremental processing technology allows companies to ingest data every few minutes rather than every few hours as ETL tools do. The result: Enterprises can run analyses on the data while it’s still fresh.
The company’s lakehouse service automatically optimizes customers’ data ingestion workflows to improve performance, the startup says. Because the service is delivered via the cloud on a fully managed basis, customers don’t have to manage the underlying infrastructure.
Onehouse also provides a raft of other features to ease day-to-day operations for users. One of those features is a capability dubbed small file compaction. It allows companies to consolidate multiple small records into a single, larger record in order to optimize query performance. Turning multiple data points into a single item reduces the total number of records that an application has to scan while reading data, which speeds up processing.
Chandar said in a statement that “while a warehouse can just be ‘used,’ a lakehouse still needs to be ‘built.’” Having worked with many organizations on that journey for four years in the Apache Hudi community, we believe Onehouse will enable easy adoption of data lakes and future-proof the data architecture for machine learning/data science down the line.”
Onehouse will use its $8 million seed funding round to expand research and development activities.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.