Databricks acquires Okera to address AI data governance
Big-data analytics firm Databricks Inc. said today that it has acquired Okera Inc., a data governance platform with a focus on artificial intelligence, in a bid to expand its own governance and compliance capabilities for machine learning and large language model AIs.
The two companies did not disclose the terms of the deal, but Okera has raised just under $30 million, according to Crunchbase, with its latest Series B round of $10 million led by Clear Sky in June 2020.
The recent emergence of generative AI models such as OpenAI LP’s ChatGPT caught the world by storm, creating a wave of popular interest in its use by enterprise customers who want to put it into their networks. At the same time, there has also been an increase in concerns about the security and privacy of the training data that is used by LLMs because they must memorize vast datasets and can spit it right back out again, meaning that they can easily consume and leak sensitive information.
In the past, customers controlled access to their data using simple data controls that only needed to address one plane such as a database, for example SQL. As long the data came from SQL policies could be created to efficiently deal with SQL queries.
“The rise of AI, in particular machine learning models and LLMs, is making this approach insufficient,” the Databricks team, including Chief Executive Ali Ghodsi, explained in the announcement. The team pointed out that the emergence of LLMs has caused the number of data points that enterprises need to govern to increase exponentially because “data sources used in AI are machine-generated instead of human-generated” and that current policy creation cannot handle the rapid pace of AI innovation.
“AI-specific governance concerns such as provenance and bias fall outside the reach of traditional data governance platforms,” they wrote.
This is where Okera’s platform steps in to help address these problems by providing an AI-powered approach that can discover, classify and tag sensitive data such as personally identifiable information. Developers or managers can then use a no-code interface to take these tags to produce access policies to create better transparency and control over the data. That way, enterprise customers can track data usage and better understand what’s happening within their own systems.
Okera also provides a technology that allows enterprises to isolate workloads without sacrificing performance. This would allow multiple LLMs to run alongside one another without mixing data sets or accidentally sharing or leaking potentially sensitive information between AI models providing increased security and privacy.
Databricks recently released its own specialized open-source LLM, Dolly 2.0, that has capabilities similar to ChatGPT. It’s smaller and more portable than many others on the market, making it very lightweight, but most important, its training data does not prohibit it from commercial use.
The company said it intends to integrate Okera’s capabilities into its Unity Catalog, Databricks’ governance layer for data and AI workloads. It will allow enterprise customers to take advantage of Okera’s AI-driven system to provide customers classification and governance of all their data, analytics and AI assets, including machine learning models and other features. This will give them the tags necessary to build attribute-based and intent-based policies to control their data usage.
Databricks said the enhancements will give enterprise customers “a holistic view of their data estate across clouds and can use a single permission model to define access policies” allow them to ensure consistent governance.
The Okera team will be joining forces with Databricks as part of the acquisition, including Nong Li, Okera’s co-founder and CEO of Okera, who is known for developing Apache Parquet, an open-source column-oriented data format for efficient data storage retrieval that Databricks and many other software companies are built on.
“We founded Okera to help modern, data-driven enterprises accelerate legitimate data access while minimizing data security risks and delivering regulatory compliance,” Li said in a statement. “Many organizations don’t have enough technical talent to manage access policies at scale, especially with the explosion of LLMs. What they need is a modern, AI-centric governance solution.”
Photo: Robert Hof/SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One-click below supports our mission to provide free, deep and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.