AI
AI
AI
Big-data observability startup Monte Carlo Data Inc. is lending its expertise to artificial intelligence with the launch of a new Agent Observability offering that ensures visibility across the full spectrum of data and AI.
The company said the new tool will make it easier for teams to detect, triage and fix any reliability issues with AI applications running in production, preventing costly “hallucinations” that can erode customer trust and prevent downtime.
Monte Carlo is best known for its popular data observability platform, which is used by enterprises to keep tabs on the “quality” of their data assets. It’s based on the same principles that guide application observability tools such as Datadog and AppDynamics, only it’s applied to data pipelines instead of app metrics. It works by using machine learning algorithms to understand the normal behavior of a customer’s data streams, so it can warn them if anything abnormal occurs.
With Agent Observability, Monte Carlo is bringing the same capabilities to the AI stack to ensure that AI applications and agents that take actions on behalf of users are always accurate and reliable. It builds on the launch in May of new unstructured data monitoring capabilities, which expanded its observaility suite to encompass things such as logs, Word and PDF documents, and PowerPoints.
The company says a more comprehensive observability solution for AI is needed, because while existing products can spot reliability issues with AI data inputs or outputs, none of them can detect both at the same time. Agent Observability, on the other hand, spans data ingestion, transformation, information retrieval and response, ensuring that both inputs and outputs are accurate.
The new offering leverages large language model-as-a-judge evaluation techniques to detect poor quality AI outputs, as well as performance issues and failures. LLM-as-a-judge involves the use of highly trained LLMs to evaluate the outputs of other AI systems, assessing them according to their quality, relevance to the initial prompt, and their accuracy. The method is said to be much more scalable than traditional human evaluations, which simply cannot keep up with the pace of AI usage today.
Monte Carlo said the entire process is automated, but there’s still a lot of human involvement. For instance, users can set up custom prompts to teach the LLM-as-a-judge what the “correct” AI outputs should look like, based on a diverse range of quality criteria. Do this, and they’ll receive alerts the moment the responses deviate from that standard.
Agent Observability also integrates a suite of “low-code evaluation monitors” that are designed to keep watch for the most common issues that can impact an AI model’s performance. This can be useful in detecting “drift,” which is when an AI system’s responses slowly evolve and become less relevant or helpful, thanks to the way they’re supposed to learn with experiences by remembering earlier interactions with users. So if an AI model’s responses start to lose clarity, become less readable or make any other mistakes, the system will flag this before the mistakes become too pronounced, allowing operators to intervene and fix the underlying issues.
There’s a whole bunch of telemetry involved too, allowing teams to investigate any problems that appear in AI models or agents and understand the root cause. It keeps track of signals including user queries and prompts, completions, latency and errors, offering a real-time view into each model’s performance. This telemetry is stored within the customer’s existing data environment, so any issues can easily be traced back to the problematic underlying data.
Monte Carlo believes that its Agent Observability suite is going to be just what the doctor ordered for the more than 80% of organizations that have already adopted AI agents to some degree. Although AI agents are extremely popular, very few companies have a way to keep track of and maintain their reliability. That’s one reason why 30% of AI projects end up being abandoned, according to a report by Gartner Inc.
Co-founder and Chief Executive Barr Moses said reliability isn’t just something that would be nice for enterprises to have. Rather, it’s absolutely critical to building scalable, adoptable AI products that can generate real business value. That explains her vision of a unified AI observability platform that spans both inputs and outputs.
“When AI agents fail, the consequences can be massive and longstanding, with low adoption of costly and time-consuming work, erosion of customer trust and a huge hit to the bottom line of the business,” she said. “Point solutions to solve siloed problems simply won’t cut it anymore. Our customers need a unified approach to ensure their AI agents are behaving as they should.”
Holger Mueller of Constellation Research Inc. said it’s no surprise to see that AI is changing observability, just as it has impacted most other areas of the software industry. “There is a unique opportunity here for Monte Carlo to do both observor inputs and AI outputs,” the analyst said. “The question will be, who is monitoring the AI that monitors the AI — the AI vendors or Monte Carlo itself? No doubt, both will make strong arguments as to why they should do it, but perhaps it would be better if both played a part.”
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.