UPDATED 09:00 EDT / JUNE 09 2022

BIG DATA

Databricks adds data lineage feature to its catalog with support for nontraditional uses

Databricks Inc. today is adding data lineage features to its Unity Catalog governance platform, a move that it says significantly expands data governance capabilities on the hybrid data warehouse or data lake that it calls a lakehouse.

Data lineage describes how data flows throughout an organization, giving customers the ability to see where lakehouse data came from, who created it and when, how it was modified over time and how it’s currently being used, among other features. The feature is now available for preview on the Amazon Web Services Inc. and Microsoft Corp. Azure clouds.

The feature helps organizations cope with the growing volume and variety of data coming in from multiple sources, how it moves and changes, who has access to it and how it’s used. Databricks says it’s bringing an updated approach to the process and that adding the feature required modifying the core database engine to accommodate nonstandard use cases such as machine learning models.

“Understanding how data flows through the organization is fundamental to being able to trust your data,” said Joel Minnick, Databricks’ vice president of marketing. “We’re going back to the core principle of the Unity Catalog, which is not just trying to govern tables and files but also modern assets like dashboards, notebooks and models.”

Lifecycle view

Data lineage enables data management teams to see all downstream functions that are affected by data changes — including applications, dashboards, machine learning models and data sets — and understand the severity of the impact so stakeholders can be notified. “The minute data comes into the lakehouse, we start to track it,” Minnick said. Metadata that travels with data elements such as the author and creation date are also imported.

The feature also helps organizations better meet compliance rules because of better traceability, Databricks said. “We capture all the data we can see at a pretty fine-grained level of detail: who created it, what changes were made, when was it changed, what pipelines it was used in and who has access to it,” Minnick said. “Ultimately, if you share that data, we can also see who it is shared with.”

Data lineage enables data consumers such as data scientists, data engineers and data analysts to conduct context-aware analysis. Data stewards can see which data sets are no longer accessed or have become obsolete so stale or unnecessary data can be removed to improve overall data quality.

Key features of Unity Catalog include automated run-time lineage to capture all lineage generated in Databricks, which provides more accuracy and efficiency compared to manual tagging. Information is captured for tables, views and columns to give a granular picture of upstream and downstream data flows. Additionally, lineage works across all languages supported by Databricks — including SQL, Python, R and Scala – as well as notebooks, workflows and dashboards.

Databricks aims to make the capability available across all the cloud platforms it supports, Minnick said.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Databricks adds data lineage feature to its catalog with support for nontraditional uses

Lifecycle view

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Databricks adds data lineage feature to its catalog with support for nontraditional uses

Lifecycle view

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies