UPDATED 15:16 EDT / MAY 01 2026

AI

Why agentic AI governance is falling short – and what we can do about it

Agentic artificial intelligence misbehavior is reaching epidemic proportions. Today’s AI governance solutions aren’t stopping the madness. We need to rethink our entire approach to AI governance.

Even though agentic AI is still nascent, many of the AI agents in production today are wreaking havoc. From deleting production databases (and their backups!) to lying and cheating to avoid deletion, horror stories about agents-gone-bad are driving reconsideration of the technology.

And yet, companies of all sizes are enamored by agents’ promise. Given large language models’ power to glean insights from vast quantities of unstructured data, LLM-powered AI agents can now take action based upon such information to accomplish an astounding variety of business tasks – as well as a commensurate number of nefarious actions.

The behavior of such agents is nondeterministic: Given the way LLMs work, agentic behavior is unpredictable. It’s this unpredictability, in fact, that makes agents so powerful, as agents can figure out for themselves novel ways to accomplish the tasks set out for them.

Companies deploying AI agents, therefore, face a dilemma: Should they either allow such agents free reign to achieve their goals at the risk of dangerous misbehavior, or lock them down so that they can’t go rogue by constraining them exclusively to deterministic, predictable behavior?

Clearly, we want some middle ground: Give agents the freedom to solve problems nondeterministically but establish sufficient guardrails to constrain their behavior to comply with our rules and policies.

Such is the motivation for the entire agentic AI governance category: a burgeoning subset of the AI governance market focused on helping organizations establish and manage such guardrails for their AI agents.

Such guardrails are unquestionably necessary. But if we look more closely at how rapidly agentic AI is evolving, it soon becomes clear that today’s agentic AI governance is woefully insufficient for reigning in increasingly dangerous AI agents.

The ‘hall of mirrors’ problem

Perhaps the most obvious problem that all agentic AI governance faces is the predilection of the more powerful AI agents to break the rules.

This malfeasance leads to a problem I discussed in my last article that I called the hall of mirrors problem, what some people call who watches the watchers.

Given the power and ubiquity of AI today, leveraging AI (in particular, AI agents) to ensure that agentic AI stays within its guardrails is ostensibly the most logical choice.

The question then becomes: How do we ensure that these “police officer” agents themselves don’t misbehave? How do we keep AI agents and their watchers from conspiring together to break the rules?

The autonomy squeeze

If adding layers of agentic police officers doesn’t address the problem, then maybe the best approach to keeping misbehaving AI agents in line is to lock down their behavior.

The most common approach today is to establish a mechanism for defining and enforcing policies and rules that directly constrain agentic behavior.

As AI agents become more powerful, however, such constraints will increasingly prevent those agents from accomplishing tasks nondeterministically – what I like to call the autonomy squeeze.

Here’s how I define the autonomy squeeze: AI agents eventually become so dangerous that the guardrails we would need to put in place to control them prevent them from providing any business value whatsoever. At that point, there’s no reason to deploy AI agents at all.

Why ‘human in the loop’ doesn’t solve the problem

Another approach is to prevent agents from taking actions directly – in other words, constrain autonomous behavior by requiring a human to step in to approve an action.

You’ll hear the phrase “human in the loop” from a wide range of vendors, including both vendors selling their own agents as well as the agentic AI governance vendors looking to constrain agentic behavior.

However, there is a massive problem with all human in the loop approaches: automation bias. That refers to the human tendency to put too much trust into automated systems – even fallible ones.

Whenever humans interact with an automated system, they may be skeptical at first. It’s human nature to check and double-check that the automation is working properly.

However, as the system successfully completes its tasks multiple times, humans become complacent. “It worked fine the last hundred times,” we say, “so I can trust it to behave properly the next time.”

Except, of course, when something goes wrong.

Automation bias, in fact, isn’t specific to AI agents, or even information technology-based automation at all. For example, investigators attributed the crash of Air France flight 447 in 2009 to human causes that boiled down to automation bias.

The cockpit crew became so comfortable with the aircraft’s automated systems that when a fault in a sensor developed, they misunderstood the problem and crashed the plane into the ocean.

Automation bias is just as dangerous for agentic AI, as it leads to the following human behaviors:

  • Humans reduce manual verification, eventually accepting results at face value every time.
  • There is an increasing reluctance to intervene, especially when the agents seem so confident in their actions.
  • Humans disregard their own judgment even when a result is suspicious. “I trusted it to take the right action the last hundred times, so it must know better, and my suspicions are unwarranted.”
  • Over time, humans lose the ability to spot potential errors, either individually or as personnel change from more seasoned to more junior staff, and example of what we call the AI deskilling paradox.

Agentic AI, in fact, exacerbates the problem of automation bias, because of LLMs’ deceptive appearance of intelligence and confidence.

Furthermore, given how rapidly agents can make decisions and how often they will make decisions at scale, humans simply won’t be able to keep up, even if they were sufficiently skeptical of suspicious behaviors.

Note that it doesn’t matter how good the agentic AI guardrails are – because of automation bias, humans will simply ignore, disregard or turn off any warnings AI governance might provide.

Solving the problem – but perhaps not the solution you want

One police officer agent won’t do. Putting one agent in charge of keeping police officer agents on track doesn’t solve the problem, either.

The best answer we have today: multiple diverse adversarial validators with multi-layer validation.

Instead of one validator (aka “police officer agent”), use multiple validators at the same time. Make sure these validators have the following characteristics:

  • They all leverage separate technologies – in particular, different LLMs. Using validators from different vendors is even better.
  • Make sure each validator is adversarial – a characteristic familiar from red teaming and penetration testing. Every time an agent makes a potential decision, each validator should actively look for reasons why that decision is incorrect or malicious.
  • Each validation should be multi-layer – to reduce the chance that any validator is a single point of failure, implement different validators at different layers, for example:
    • Syntax layer: Is the result well-formed?
    • Semantic layer: Does the result make sense?
    • Execution layer: Does the result work in production?
    • Outcome layer: Will the agent achieve its goal?

If multiple diverse adversarial validators can answer these questions for all potential agentic behavior, then your AI governance system can minimize the risk of agentic misbehavior.

The Intellyx take – did you say ‘minimize the risk’?

Yes – taking this approach to agentic AI governance at best lowers the risk – but can never eliminate it.

There is always the possibility that some agentic conspiracy suborns the validators, or that some systemic pattern of validator error or misbehavior lets some agentic mischief through.

The primary lesson here: Agentic AI never provides certainty. It can only provide confidence thresholds.

In other words, nondeterministic (probabilistic) behavior can only provide probabilistic trust. Absolute trust is impossible as long as agents behave nondeterministically.

Confidence thresholds always fall short of 100% – and the difference between the threshold and 100% is what we call the error budget.

Site reliability engineers or SREs are quite familiar with error budgets: Given the available time and money, SREs can’t guarantee a site will be up all the time.

Instead, they work toward the error budget, which quantifies just how good the performance can be given those time and money constraints – in other words, how much failure is acceptable.

Just so with agentic behavior. Given the behavioral constraints on such behavior, the best we can do is to say that agents will behave well within their error budgets – but sometimes they will misbehave regardless of all the constrains and protections we put into place, and we simply have to live with that fact.

If you’re not OK with such error budgets, then don’t deploy AI agents.

Jason Bloomberg is founder and managing director of Intellyx, which advises business leaders and technology vendors on their digital transformation strategies. He wrote this article for SiliconANGLE. A human being wrote every word of this article.

Image: Jason Bloomberg

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.