UPDATED 16:15 EDT / DECEMBER 13 2021

BIG DATA

Databricks provides data lakehouse as a single source of truth

Having a unified data platform not only minimizes costs, but also enhances scalability, governance, security and performance, because data engineers, analysts and scientists — using different programming languages — work collaboratively on the same data.

Combining the capabilities of a data warehouse and data lake eliminates the complexities of moving data back and forth across different systems and worrying about data formats. This is made possible through Databricks Inc. open-source technology, called Delta Lake, which enables the creation of a data lakehouse, according to Joel Minnick (pictured, right), vice president of product and partner marketing at Databricks.

“Delta Lake allows us to apply performance, reliability, quality and scale that you would expect out of a data warehouse directly on your data lake,” Minnick said. “Operate from one single source that handles all analytics workloads — both traditional analytics workloads and data science and machine-learning workloads.”

Minnick and Gregory Rokita (pictured, left), executive director of technology at Edmunds.com, spoke with Dave Vellante, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during AWS re:Invent. They discussed Databricks’s Lakehouse Platform, its open-source technology Delta Lake, and its partnership with Edmunds. (* Disclosure below.)

Lakehouse presents machine learning as a first-class citizen

In vehicle pricing, accuracy is important because customers need to know whether they are getting a good or bad deal. Using the Delta Lake enables Edmunds to be agile by having a responsive pricing model that factors in ever-changing metrics, like used-car prices being 38% higher year-over-year, according to Rokita.

“We give consumers offers, which leverage the pricing that we develop on top of the Lakehouse,” Rokita said. “So with the Lakehouse, we were able to develop a data pipeline that ingests the transactions, cleans them, and then feeds that curated feed into the machine learning model that is also deployed on the Lakehouse.”

Being locked into a specific data format hinders precise decision-making, watering down a company’s competitive advantage. A data warehouse is built on some proprietary format, like structured data, one attribute that called for the creation of a data lakehouse, according to Minnick.

“A lakehouse is the ability to have one unified platform to handle your traditional analytics workload,” Minnick stated. “So you have BI in reporting, traditionally the lake, and the data warehouse workloads on the same platform as your data science and machine learning.” 

Backfilling data is the norm at Edmunds, based on its line of business. And a change in logic leads to reprocessing massive amounts of data, according to Rokita.

“With the Lakehouse, we can reprocess months’ worth of data in a matter of minutes or hours,” he said. “The Lakehouse is based on open standards, like Parquet, that allowed us to hook open-source and third-party tools on top of the Delta Lake.”  

Administering multiple systems is cumbersome based on factors like delays. Delta Lake and the Lakehouse Platform provide consistency within one system that handles different data sources, according to Rokita.

“Delta Lake simplifies the architecture quite a bit,” he stated. “In a modern enterprise, you have to deal with a variety of different data sources, structured, semi-structured and unstructured in the form of images and videos.”

With Databricks having set the world record with the TPC-DS 100 terabyte benchmark, the data Lakehouse Platform architecture is founded on this benchmark, which offers a 12X better price performance than what cloud data warehouses provide, according to Minnick.

“So not only are we jamming on this extremely high scale and performance, but we’re able to do it much more efficiently,” Minnick concluded.

 Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of AWS re:Invent. (* Disclosure: Databricks Inc. sponsored this segment of theCUBE. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU