When engineers can’t handle IT alone, AIOps is there to help
Better, faster, cheaper is the mantra of humanity. Since the dawn of civilization, our species has looked for ways to do more in less time using fewer resources. Our drive for operational efficiency led to the creation of farms, cities and eventually to the modern world.
Moving on and up the industrialization ladder, we find ourselves at a crossroads: Technology has become so fast and so complex that humans simply can’t keep up.
The answer is (no joke) more technology. As digital workloads become too fast and complex for humans to handle, we’re calling on machines to help us out. Specifically, machine learning algorithms that can process data at speeds far beyond the limitations of the human mind.
AIOps: ‘Intelligent systems to replace the human cerebellum’
Like many new concepts, the term AIOps is sometimes thrown around without a true understanding of its meaning. Coined by research and advisory company Gartner Inc., AIOps is officially defined as combining “big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.”
More informally, it’s “automation on steroids,” according to Shaun O’Meara, global field chief technology officer at open cloud company Mirantis Inc. In a video discussion during the recent Mirantis Launchpad 2020 event, O’Meara named AIOps as a trend to watch for the future. Through AIOps, “we can analyze massive amounts of data a lot faster than a single human could or even just a normal type system that’s searching,” he said, joking that AIOps stands for “intelligent systems to replace the human cerebellum.”
While the complexity of the IT environment is the driving force behind the rise of AIOps platforms, there are many factors contributing to create that complex environment.
First is the ever increasing amounts of data that IT Ops is expected to monitor. This in turn is driven by the exponential increase in endpoint devices on the network. The COVID-19 pandemic has highlighted the dangers of lax device management and security oversight, with statistics showing a 41% increase in sensitive data stored on unsecured end-points since employees have been forced to work from home.
This decentralization of operations poses yet more problems for overwhelmed IT managers. Alongside employees accessing resources outside the protective ring of the physical office, there’s the added fun of the internet of things, with unsecured IoT devices a favorite point of entry for cyberattacks. One potential future use of AIOps is to monitor systems in smart cities.
Another factor is the changing face of IT infrastructure. As infrastructure is virtualized into code, developers are entering into the core IT domain. Yet it is IT operations that hold responsibility for keeping everything running smoothly.
Adding to the increased pressure on IT operations is the new normal requirements for uptime. The days for scheduled maintenance are long past, with companies striving to hit “five nines” or higher in availability of services. Even slowdowns can cause lost revenue, as customers wait just three seconds before bailing to go to another site.
ITOM, ITSM and AIOps: Man and machine join forces
IT has long separated infrastructure and applications operations, known by the acronyms ITOM for information technology operations management, and ITSM for information technology services management. One of the benefits of AIOps is breaking down these silos to allow the two sides to work together as one. This is essential in order for IT to break through the inefficiencies of “old-school” management and adopt the more free-flowing, agile culture espoused by DevOps. In fact, AIOps has been described as continuous integration/continuous development for ITOps.
ITOM covers the traditional domain of IT management, making sure that infrastructure performance is maintained through logging metrics, events, etc. ITSM monitors end-to-end service delivery to the customer, generating incident and change data.
Just as developers and operations merged to make DevOps, AIOps is the next chain in the evolution ladder. Linking infrastructure, compute and machine learning smarts (automation), AIOps merges three into one for unified, continuous, operations.
“AI is becoming an essential tool for accelerating, scaling, automating and otherwise optimizing infrastructures at every level,” stated industry analyst James Kobielus, in a discussion on how AIOps is being used to optimize cloud computing. “AIOps solutions enable these benefits by driving real-time monitoring, predictive analysis, root cause diagnostics and anomaly detection on system- and application-level events in IT infrastructure, and also in data, application and services at higher layers in the cloud computing stack.”
Learning to trust AI, step by step
There’s just one problem … science fiction has been warning us for decades of the dangers of allowing AI too much freedom. And the ultimate aim of AIOps is completely hands-off management with autonomous issue detection and self-healing capabilities. And let’s be honest, trusting AI with control over systems that are increasingly essential for humanity is a big step.
“We have a few hoops to jump through before we can look at where AIOps is going to be really effective, and the first one is a trust issue,” O’Meara said.
In a session during the Mirantis Launchpad 2020 event, O’Meara laid out the practical steps for working toward a future where trusted AI is working autonomously, optimizing every system to its fullest efficiency.
Step one: Collect and correlate
Out of the hundreds of gigabytes, potentially terabytes of logging data created each day in a digital business, the vast majority gets “file thirteen’ed” or stored in a hard-to-access database for security reasons, according to O’Meara.
“Let’s start by taking that data and actually analyzing it,” he said.
At this point, the AI is trained under human supervision, collating and correlating data to seek out patterns.
Step two: Pattern detection
Once the AI recognizes a pattern, then it can do inference to establish the root cause of recurring issues. O’Meara gives the example of a recurring pattern of failure when specific network links become congested.
“Based on the patterns we’ve been learning, we know from the past that if those three network links get full we’re going to have a failure in region X,” he said.
Step three: Inference of cause
This is where the algorithms take the traditional performance metric tracking to a different level; recognizing a potential problem before it reaches critical status. Using the same example of the congested network, this would be triggering an alert as the links start to fill rather than reporting that they are full.
Step four: Action
The cumulation of the AI cycle is when the AI has gathered trusted data, detected patterns and inferred a cause. At this point it can take action to implement change. Continuing with the network example, O’Meara explained how the AI could correlate local patterns with patterns across different regions and not only prevent failures, but optimize the network based on usage peaks across different timezones.
“It could be the Beijing office is coming online, [so] let’s move the majority of the workload to a cloud that’s closer to them reducing the network bandwidth cost and inference and also reducing the impact on international lines,” he said.
Then, as workers in Beijing go home, the workload can be shifted to the next time zone that is gearing up for peak use hours. “As a human on your own to try and correlate that information would just be insane,” O’Meara added.
As the world realigns itself around a data-first, remote way of life, the amount of responsibility handed off to AIOps is set to increase, with O’Meara predicting that AI could “potentially write applications in the future.” While humans may be error-prone and “incredibly slow” compared to computers, the species position as “top dog” isn’t under threat.
“Humans will still decide on the base logic; humans will still decide on the creative components,” O’Meara concluded.
To learn more about AIOps, open cloud and container management technology, check out the Mirantis Launchpad 2020 event on-demand presentations.
(Disclosure: TheCUBE was a paid media partner for the recently concluded Mirantis Launchpad 2020 digital event. Neither Mirantis Inc. nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Image: twenty20photos via Envato
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU