UPDATED 09:00 EDT / JUNE 12 2024

AI

Databricks broadly boosts support for AI model training

Databricks Inc. today announced several enhancements to its Mosaic AI toolset for building and deploying artificial intelligence models that specifically target generative AI applications.

The improvements, announced at the company’s Data + AI Summit in San Francisco, are aimed at better supporting compound AI systems, or those that involve several interacting components, as well as improve model quality and AI governance.

Compound AI systems use multiple components, including models, retrievers, vector databases and tools for evaluation, monitoring, security and governance. They may combine different types of AI models, such as machine learning algorithms, natural language processing models and computer vision systems, with each contributing specialized functionality.

A better RAG

The new Mosaic AI Agent Framework is intended to help developers quickly and safely build high-quality retrieval-augmented generation applications using foundation models and their enterprise data. RAG enhances text generation tasks by incorporating relevant information from a corpus or knowledge base typically specific to an organization.

Mosaic AI Agent Evaluation enables rapid iteration and redeployment to incrementally improve models. Databricks described it as an AI-assisted evaluation tool that automatically determines if outputs are high-quality and provides a user interface for feedback from human experts.

A new GenAI Tools feature lets organizations govern, share and register agents or tools using Unity Catalog, a unified governance layer for data within the Databricks Data Intelligence Platform. The catalog ensures that models can be used and governed securely and it makes tools discoverable across an organization. AI agents are autonomous programs or systems that mimic human behavior and decision-making.

Finer tuning

Mosaic AI Model Training fine-tunes open-source foundation models with an organization’s private data. Fine-tuned models are controlled by the organization and are considered to produce better results than RAG training. Smaller models fine-tuned by Model Training are also faster and less expensive to serve than larger proprietary models because they have fewer parameters and require less computing power, Databricks said.

RAG “is a bit like an open-book exam in school where you’re allowed to search the web and find documents to answer a question,” Matei Zaharia (pictured), Databricks’ chief technology officer, said in an interview with SiliconANGLE. “With fine-tuning, you give the model examples of a task and the desired output and then update the parameters in the model to give those outputs. The model has a better chance of learning more complex relationships. It’s a bit like a closed-book exam.”

Mosaic AI Gateway provides a single interface for querying, managing and deploying any open-source or proprietary model so users can easily switch models without needing to make extensive changes to the application code, Databricks said.

“Typically, there are many teams that want to use AI, and if all of them start connecting to different models, you have a huge management problem and the risk of sensitive data leaking or moving across geographic zones,” Zaharia said. “AI Gateway is a proxy that can impose rate limits, access controls and guardrails so you can easily swap  models behind it.”

The Mosaic AI Agent Framework, AI Agent Evaluation, AI Model Training and AI Gateway are now in public preview. Unity Catalog GenAI Tools is in private preview.

Unity Catalog goes open source

Following last week’s announced plans to acquire universal storage platform maker Tabular Technologies Inc., Databricks also said it would release Unity Catalog under an unspecified open-source license. The company said the move demonstrates its commitment to open ecosystems.

Unity Catalog provides a single gateway to any data format and compute engine, with the ability to read tables in Databricks’ own Delta Lake format, as well as Apache Iceberg and Apache Hudi. It also supports the Iceberg representational state transfer catalog and Hive metastore interface standards, interoperates with all major cloud platforms, and works with various open-source analytical frameworks.

Unity Catalog is unique in its ability to read AI models as well as data, Zaharia said. “In the same way I could share a table with someone, I could share a model,” he said. “You can register a model and set access controls and labels. You can also access a lot of machine learning- specific metadata, like who trained it, when was it trained and whether there are multiple versions.”

Vinoth Chandar, founder and chief executive of Infinilake Inc., which does business as Onehouse, said open catalogs are needed because they become the new lock-in point for database vendors. His company makes a cloud-native managed lakehouse service built on Apache Hudi that interacts with many open table formats.

“There are open data storage and table formats, but for users to consume the data, they have to go through a catalog, and each vendor has its own,” he said. “We need a central open resource.”

Databricks and Nvidia Corp. also announced that they’re tightening their partnership to bring Nvidia’s Compute Unified Device Architecture to Databricks’ Data Intelligence Platform. CUDA is a parallel computing platform that enables developers to define parallel functions that run on Nvidia graphic processing units without extensive low-level coding.

Databricks is also adding native support for Nvidia GPU acceleration to the platform and plans to develop native support for Nvidia-accelerated computing in Photon, a vectorized query engine that is essentially a business intelligence warehouse that sits atop its data lake to form a lakehouse architecture.

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU