UPDATED 11:30 EDT / JUNE 24 2020

BIG DATA

Databricks buys Redash and debuts Delta Engine, a fast query tool for data lakes

Big-data company Databricks Inc. wants to help enterprises dig into their vast troves of data even faster, so today it launched a new, high-performance query engine for cloud-based data lakes.

The company also announced the acquisition of an Israeli startup called Redash Ltd., which has built an open source dashboarding and visualization tool to help data scientists explore their data more easily.

The new Delta Engine tool is designed to work with Databricks’ Delta Lake, which is an open-source structured transaction layer launched last year that’s aimed at improving the efficiency of enterprise data lakes.

Data lakes are systems or repositories of data that’s stored in its natural format, usually object “blobs” or files. They usually act as a single store of all enterprise data, including raw copies of source system data and transformed data used for tasks such as reporting, visualization, analytics and machine learning.

Although data lakes are useful, they can also be unreliable or inaccurate, for several reasons. These include failed writes, schema mismatches and data inconsistencies, which arise when batch and streaming data is mixed together.

When it launched the Delta Lake project last year, Databricks said the idea was to ensure data can be kept more accurate and reliable. Delta Lake helps to do that by managing transactions across both batch and streaming data, as well as multiple simultaneous writes.

Essentially, what it does is to bring “quality and reliability” to data lakes, enabling companies to build curated data lakes made up of both structured and semi-structured data, so they can perform faster analytics on that data.

In an interview with SiliconANGLE, Joel Minnick, vice president of marketing at Databricks, explained that it’s difficult to perform analytics on traditional data lakes because the information within them comes from multiple sources. Typically, what most organizations do is duplicate this data across various data warehouses and operation systems, because the tools they use to query and analyze it aren’t suitable for fast query execution across multiple data types.

“Companies end up with multiple copies of the same data, multiple architectures and higher costs,” Minnick said. With Delta Lake, he added, “the idea is to bring those into one architecture. It adds performance, reliability and governance that data lakes need to make them more useful.”

Databricks’ new Delta Engine is designed to help companies perform faster analytics on the data stored within their Delta Lakes, Minnick said. It enables them to analyze their data without moving it out of the Delta Lake, while boosting query speeds by up to eight times thanks to a “vectorized” query engine.

Of course, querying data is one thing, but companies also need to understand what those queries tell them, and that’s where the acquisition of Redash comes in. Redash is an open-source project that helps data scientists make better sense of their data by helping them to visualize the results of their queries in various types of charts, cohorts and funnels, Minnick said. The results can then easily be shared with other users.

“Redash makes what people are doing with their data consumable by the rest of the organization,” Minnick said. “Companies have a big need for business intelligence that’s easily consumable through dashboards. Redash provides a self-service for people with less skills.”

Constellation Research Inc. analyst Holger Mueller told SiliconANGLE that it’s becoming increasingly important for enterprises to make sense of their data in easier and faster ways.

“Companies have a big need to power insights for their next-generation applications,” Mueller said. “The launch of Delta Engine and the acquisition of Redash help support that.”

Databricks said customers can take advantage of Redash by using a free connector to analyze queries made using Delta Engine. The company is also working on fully integrating Redash with Delta Engine and its main Unified Data Analytics Platform, and expects to roll out a public preview of that capability later this year.

With reporting from Robert Hof

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU