UPDATED 05:00 EDT / OCTOBER 25 2017

BIG DATA

With Delta, Databricks aims to make data easier to extract and process

Databricks Inc. aims to make data easier to extract and process with the launch of a new service intended to eliminate many of the hassles of juggling multiple data lakes, warehouses and streaming ingestion systems.

Co-founder and Chief Executive Officer Ali Ghodsi introduced Databricks Delta during his keynote at Spark Summit Europe 2017 on Wednesday. He said the new unified data management system will become a key component of the firm’s cloud-based Unified Analytics Platform, which is based on the open-source Apache Spark big data framework. The main advantage of Delta is it eliminates the complex extract, transact and load process that’s necessary to prepare data from these different sources for querying and analysis, Ghodsi said.

In his keynote, Ghodsi said he had heard from customers that they were struggling with the limitations of data lakes and data warehouses, especially when it comes to the complex process of moving data between them. “Because Delta is a unified data management system that handles both low-latency streaming data and batch processes, it allows organizations to dramatically simplify their data architectures,” he said.

Databricks Delta does this by making it easier to move data around different architectures. One of its roles is to simplify the data pipeline by enabling the use of something called Delta tables as a data source and sink. Delta tables are used to provide transactional guarantees in situations when multiple batch and streaming jobs are being run on a data set, allowing data warehouses to return the most recent and consistent view of continuously changing data.

The second thing Databricks Delta does is to automate the way data is stored, so customers don’t need to waste time manually tuning their systems before querying different data sets. Delta optimizes how data is stored, lumping data sets that are commonly queried together in the same place in order to speed up access. It also compacts these files in such a way that they can be read more efficiently, the company said. This intelligent data skipping and indexing process means that after data has been accessed for the first time, subsequent access will be an order of magnitude faster.

In effect, Databricks Delta is a new data management layer for Spark environments that combines the scale and cost-efficiency of a data lake, the query performance of a data warehouse, and the low latency of a streaming ingest system into one system, Ghodsi explained.

Available in beta now, Delta can be integrated with the Databricks Unified Analytics Platform through standard Apache Spark application programming interfaces.

Image: KamiPhuc/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU