UPDATED 09:00 EST / OCTOBER 16 2019

BIG DATA

Open-source Delta Lake project moves to the Linux Foundation

Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.

Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by thousands of organizations, including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.

Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.

But data lakes aren’t without their problems, the most common of which is that a lot of the information they store is unreliable or inaccurate. This is the result of several reasons, including things such as failed writes, schema mismatches and data inconsistencies that arise when batch and streaming data is mixed together.

Unreliable data can be a burden because it prevents companies from getting accurate insights in a timely fashion. It can also slow down initiatives such as machine learning model training, which requires consistent data to ensure accuracy.

Delta Lake was designed to improve the efficiency of data lakes and ensure information is kept accurate and reliable. It does so by managing transactions across batch and streaming data and multiple simultaneous writes. It also does away with the need to build the complicated data pipelines that are used to move information across different computing systems.

In fact, it’s fair to say that Delta Lake is actually more similar to a “data warehouse” such as Apache Hive than a data lake. The main difference between the two is that the information in the latter is transformed to conform to the data warehouses’ own pre-defined schema, which means it cannot be stored in its native format. This means that the data is more reliable, though it means enterprises lose a lot of flexibility when it comes to analyzing it.

Databricks co-founder and Chief Executive Officer Ali Ghodsi said the company was handing over stewardship of the project to the Linux Foundation in order to encourage more innovation from the open-source community.

“To address organizations’ data challenges we want to ensure this project is open source in the truest form,” Ghodsi said. “We’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

Constellation Research Inc. analyst Holger Mueller told SiliconANGLE noted that the data lake is the foundation of modern enterprises.

“It’s good to see standardization and open-source beneficial dynamics at work,” Mueller said. “But providing technology assets to open-source bodies is still not a guarantee of success, only time will tell.”

The Linux Foundation said Delta Lake will operate under an open governance model that’s meant to foster more participation in the project.

Ghodsi spoke about Databricks with theCUBE, SiliconANGLE’s livestreaming studio, earlier this year:

Image: Delta.io

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU