UPDATED 09:00 EST / MARCH 02 2026

AI

Corvic Labs launches to standardize testing and governance for AI agents

Corvic Inc., a generative artificial intelligence-native enterprise platform that helps organizations deploy and monitor AI applications using complex, multimodal data, today announced the launch of Corvic Labs, a new initiative aimed at providing open operational infrastructure for developers and researchers.

Corvic Labs will build services and products to assist with evaluating and governing agentic AI systems.

The launch comes at an inflection point: AI agents are becoming a mainstay across enterprise and consumer use, beginning to supplant chatbot-style, single-prompt use cases with multistep, tool-using agents that can perform long-horizon, autonomous, goal-based tasks.

The industry at large needs a way to test these agents, understand how they operate in practice, and experiment with them. That, in turn, means infrastructure that can reproduce testing conditions, provide practical analysis and preserve audit trails.

“Enterprises have difficulty launching products into production; [these systems] could be wild beasts,” Corvic Chief Executive Farshid Sabet told SiliconANGLE in an interview. “It works in certain ways at first — all of the sudden, it doesn’t work like that. It’s important that we give our customers more confidence to be able to launch their product.”

The company stressed that Corvic Labs will be intentionally distinct from Corvic AI’s commercial enterprise platform. The separation is meant to keep Labs neutral and community-oriented and to focus it on open, free developer tooling.

The first release of Corvic Labs will be the Agentic MCP Evaluator (GitHub repo here), an open and developer-friendly platform designed to simplify how teams test and evaluate multistep agents.

Using this product, developers gain access to an evaluation framework that can attach to AI agents via Anthropic PBC’s Model Context Protocol, an open standard for large language models, other AI and agents that allows them to connect to other AI models and third-party sources to exchange data. It can evaluate agent behavior across structured tasks, use LLMs as judges for agent outputs, run repeatable, standardized evaluations and generate structured reports, including PDF summaries.

At the core of Sabet’s claim about the Agent Evaluator is that enterprises are already building AI products and pipelines and they are hitting walls when they try to reproduce hallucinations and try to measure why they happen. They need a consistent way to understand what reproduces accuracy.

“Measuring success is very subjective, not very repeatable, not very systematic,” he explained.

Corvic Labs’ approach with the evaluator leans into deterministic workflows, domain metrics and scoring, paired with open infrastructure meant to make measurements systematic and comparable over time. If it works as intended, it could unlock a range of evaluation practices that are still out of reach for many teams today.

“[Our approach] is much more streamlined, much faster, much more deterministic… with consistent scoring,” he said.

As teams add and adjust data, agentic behavior changes, this can be a major problem for AI product and service teams. Even the smallest change to a dataset or training can wreak havoc when it comes to accuracy or hallucinations, creating labor for data scientists and engineers who must take their work back to the drawing board again and again. Model evaluations are the bread and butter for trying to fine tune accuracy and alignment.

Even if a team never touches its underlying data, agent behavior can still drift the moment the foundation shifts under it. Sabet argued that evaluation has to account for the fact that “there are different versions” and that “models keep coming,” with real capability tradeoffs; a workflow that behaves one way on one release can behave differently on the next.

In practice, he said teams face “a lot of variables” beyond datasets, including how AI models “format the response” and even the “temperature” used. Though Sabet didn’t dwell on infrastructure in the same terms, teams know the story well enough: Migrating between clouds, regions or graphics processing unit types can also change behavior and accuracy.

Corvic Labs aims to remove the operational overhead for organizations to adopt agentic architectures and integrate autonomous tools, memory and external systems. This means being able to test these systems in a deterministic, understandable way. Evaluation itself has become a critical bottleneck.

Today, many teams still build ad hoc methods to test reliability, reasoning quality and tool use. Corvic’s MCP Evaluator is meant to provide much of that framework — and as an open-source tool, it could help democratize access.

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.