UPDATED 11:54 EDT / MAY 31 2026

INFRA

From reactive operations to autonomous infrastructure: What IT leaders must do next

As artificial intelligence agents begin to proliferate across information technology infrastructure, IT leaders are moving away from asking, “How do we monitor every alert?” to “How do we design infrastructure that can solve its own problems?”

Operations teams can now deploy agents to triage alerts, correlate operational data and automate certain remediation steps without constant oversight. The potential to free up time for more meaningful and strategic work could be a monumental shift in how IT is managed.

The operations model has historically relied on reactive measures, meaning teams need to be on call around the clock. The operations crisis caused by tool sprawl, talent shortages and burnout has made this scenario unsustainable. Autonomous IT can be the answer.

But though enthusiasm is clearly there, only 5% of the IT professionals we recently surveyed report that AI is currently core to their operations. Given this gap between AI ambition and execution, what will it take to build the infrastructure for autonomy in the coming years?

More than technology

Moving from AI-assisted workflows to autonomous operations requires more than sophisticated models; it depends on unified visibility and reliable access to operational data across the IT environment. After all, autonomous systems cannot manage what they cannot see.

In many cases, the challenge is not a lack of data. Organizations already use complex observability stacks to monitor alerts, telemetry, logs and performance signals. The problem is that these systems often operate in isolation. When the operational context is fragmented, decisions are often made with partial visibility. Autonomy can actually amplify those blind spots.

Data standards and integrations have become the critical moving parts in the autonomous transformation timeline. They give agents the structure to interpret and correlate data across systems, enabling more autonomous workflows. Anthropic PBC’s open-source Model Context Protocol has helped standardize how AI connects to disparate data across applications, development tools and workflows. By enabling systems to expose relevant data or actions through a common interface, MCP helps IT move from isolated agentic workflows toward autonomous operations grounded in a more complete understanding of the environment.

Organizations are now building on these advancements to engineer AI infrastructure that goes well beyond simple “if-then” commands to agents that can understand and remediate issues independently. However, connectivity is only one part of readiness. Data still needs to be accurate, consistent and current to support reliable decisions.

Building a data foundation

Here’s what IT leaders need to check off their lists before expanding agents into operational workflows:

  • Maintain an up-to-date inventory. Use automated discovery to keep an accurate view of devices, applications, cloud resources, identities and configurations across the IT environment.
  • Normalize the data agents rely on. Standardize formats and fields, from dates and timestamps to asset IDs and telemetry attributes, while removing duplicates and inconsistencies.
  • Align metadata across systems. Replace free-form tagging with approved fields, controlled vocabularies and consistent hierarchical tag structures so agents can interpret context reliably across systems.
  • Continuously validate data quality. Flag stale records, missing fields, conflicting sources, inconsistent classifications and potential manual entry errors to keep operational data current, complete and usable.

Eliminating data silos is about more than improving access; it’s about creating a single coherent source of truth that agents can reliably reason from.

Low-risk, high-value tasks

The success of autonomous IT infrastructure will also depend on how realistic and grounded IT leaders can be about return on investment and human-in-the-loop requirements. That means assessing which automation use cases deliver measurable value and which add cost or complexity while doing little to improve outcomes.

Balance ambition with discipline. This starts with identifying repetitive, well-established tasks where automation can deliver clear value without introducing unnecessary risk. Examples are:

  • Endpoint remediation. AI can analyze tickets, device health, application logs, policy changes and known incidents to identify likely causes and execute approved remediation steps, such as clearing caches, repairing configurations or reapplying device policies.
  • Network anomaly response. Agents can correlate network alerts, topology data and device information to determine the source of anomalies and assess affected assets. They can then take predefined containment actions, such as disabling non-critical access ports or escalating the issue for human approval when the business impact is uncertain or the action is high-risk.
  • Routine credential lifecycle tasks. Things such as credential rotations or certificate renewals follow deterministic steps and are ideal early candidates for automation. AI can add value by detecting when these actions may be needed outside normal rotation or renewal cycles, such as by identifying anomalous credential usage.

IT leaders must be pragmatic about closed-loop systems and the snowballing costs associated with deploying agents at scale. Agentic tools can now remediate simple tickets and requests, but human judgment is still needed for higher-stakes IT issues and decisions. Recent incidents such as the service outage involving Amazon Web Services Inc.’s Kiro coding tool underscore this need. Amazon’s response was to add mandatory peer review for production access, underscoring the value of keeping humans in the loop.

Doug Murray is CEO of infrastructure monitoring and management firm Auvik Networks Inc. He wrote this article for SiliconANGLE.

Image: SiliconANGLE/Reve

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.