UPDATED 12:00 EST / NOVEMBER 03 2025

AI

Databricks expands tools for governing and evaluating AI agents

Databricks Inc. today announced a series of updates to its flagship artificial intelligence product, Agent Bricks, aimed at improving governance, accuracy and model flexibility for enterprise AI agents.

The announcements, part of its “Week of AI Agents,” include new features in the MLflow open-source platform for managing the machine learning lifecycle, a marketplace for Model Context Protocol servers and tools for extracting structured data from documents.

Databricks said the updates are designed to help enterprises move AI agents from pilot projects into production while maintaining control over data access, model usage and decision accuracy.

The MLflow platform, which was previously focused on machine learning, will now support the evaluation and monitoring of AI agents. “We’re open-sourcing a huge swath of our evaluation capabilities into MLflow,” said Craig Wiley, senior director of product for AI and machine learning at Databricks.

Evaluation frameworks are critical for organizations that want to deploy agents, particularly in an outward-facing context. AI model evaluation ensures that agents are reliable, accurate and trustworthy, and may also encompass factors like fairness, bias and robustness.

Tunable evaluation

The updated framework allows users to create custom evaluation logic, including tunable “judges” that assess model performance using domain-specific criteria. “You can give natural language feedback, and behind the scenes we’ll train the judges to better reflect that feedback,” Wiley said.

Users can also import or create their own judges or use open-source versions provided by Databricks. Judges can evaluate both test sets and live production inferences.

Databricks is also enhancing its AI Gateway, a centralized control-layer for generative AI models and agents built on the Databricks platform with support for the Model Control Protocol open standard for communicating with external data sources and tools.

AI Gateway already provides a single, governed interface for managing all model endpoints, whether they be proprietary models like Open AI LLC’s GPT-5, Google LLC’s Gemini, Anthropic PBC’s Claude and the open-source Llama, or a company’s own variants. The new MCP Catalog and MCP in Marketplace extend the same level of governance to external tools and data sources, so agents can securely discover and connect to trusted MCP servers with inherited Unity Catalog permissions, audit trails, and data lineage.

“Any large language model endpoint on Databricks can be governed using AI Gateway,” Wiley said. “If customers bring us an endpoint, we’ll take the same standards we have for governing our native models and apply them to that endpoint.”

Governance features include logging, access control, rate limiting and audit trails, enforced via Databricks’ Unity Catalog. Controls can be set to limit usage for cost control reasons.

MCP marketplace

Support for Model Control Protocol allows AI agents to securely interact with third-party data and services, such as retrieving context from SuSea, Inc.’s You.com or analyzing customer data with Glean Technologies, Inc.’s search engine.

For example, “You.com provides one of the best indexes of the internet,” Wiley said. “They can make that index available to agents, but continue to have the kind of governance, access controls, monitoring and logging they expect from Databricks.” Wiley said Databricks intends to let customer demand drive the selection of MCP servers in the marketplace.

“If folks have functionality that our customers are demanding, we’d love to figure out a way to make that available,” he said. Though there is no cost for partners to list in the marketplace, “there’s a cost for the customer of invoking some of those MCP servers.”

The Multi-Agent Supervisor feature, now in beta test, can orchestrate workflows across multiple agents and MCP servers. Databricks said that allows agents to take automated actions such as creating support tickets or running SQL queries while maintaining governance through Unity Catalog.

OCR on steroids

To help agents access knowledge locked in documents, Databricks also introduced ai_parse_document, a SQL function that extracts structured data from PDFs and tables. Acting as kind of an optical character recognition engine on steroids, the function converts unstructured content into governed, searchable data within Unity Catalog.

“Not only does it identify or translate text, but it also chunks that document for usage in a vector database,” enabling use in retrieval-augmented generation and other agentic workflows, Wiley said. Customers can extract, refine and tag information using the Databricks’ information extraction brick, which can recognize entities such as contract terms or personal identifiers.

Databricks is focused on helping organizations deploy AI agents in high-stakes applications, where governance and evaluation are critical, Wiley said. “Our goal is to help organizations put these agents into the path of risk and high-value use cases,” he said.

The new capabilities are available starting today, with some features in beta or public preview.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.