How cloud-native observability is transforming enterprise technology
In the world of information technology operations, observability extends the principles of IT monitoring by pulling together data from logs, metrics, traces and events to empower operators to identify root causes of issues and resolve them quickly. Cloud-native observability, in turn, extends these capabilities to Kubernetes and, by extension, the full gamut of multicloud hybrid IT.
Cloud-native observability is not only relevant to organizations that are implementing Kubernetes, however. As cloud-native computing represents a paradigm shift in enterprise IT, the observability part of the story also reflects new ways of leveraging technology to manage increasingly complex IT infrastructure.
To illustrate these new approaches, I spoke with 10 vendors that are leading the charge with cloud-native observability innovation. These innovations follow three main themes, each of which represents an aspect of cloud-native paradigm shift that is changing everything about how we run technology in our organizations. (* Disclosure below.)
Theme No. 1: Real-time visibility into root causes that accelerates work of DevOps teams
Vendors of traditional monitoring tools focus on the needs of operators, giving them dashboards that leverage sampled data that may be minutes or even hours old. In contrast, some cloud-native observability tools provide near real-time visibility into incidents as well as their causes.
Not only does this additional speed reduce the operators’ mean time to resolution, it also gives developers insight into the impacts of the code they’re working on at the moment – either in development or test environments, or in the production environment during canary testing.
Vendors that bring this real-time capability to cloud-native observability include Instana Inc., which provides feedback for continuous integration and continuous deployment or CI/CD activities, as well as root-cause identification and analytics and insight into the context of service dependencies.
Humio Ltd. also provides an intuitive tool for developers that offers visibility into operational behavior at the time of user interaction. In fact, Humio focuses on support for human interaction with data, both for operators and developers. A third vendor, Logz.io, focuses in particular on cloud observability for engineers.
Three incumbent vendors also provide real-time visibility that supports devops activities, including VMware Tanzu Observability (formerly Wavefront), Splunk SignalFx and New Relic. In fact, Splunk Inc. and New Relic Inc. take support for developers one step further by offering programmability in their observability platforms. New Relic enables engineers to craft custom dashboards, while Splunk’s SignalFx platform is fully programmable: every capability that it exposes to users is available via an application programming interface.
Theme No. 2: Automated, AI-driven root-cause detection
AIOps – leveraging AI (in particular, machine learning) to uncover anomalies in operational data and determine their root causes — is now a burgeoning market in its own right. Many cloud-native observability vendors also offer AIOps capabilities, with a cloud-native twist.
Zebrium Inc., for example, offers a log manager that provides autonomous incident and root cause detection. By “autonomous,” the company means that its tool features unsupervised machine learning that leverages common patterns of software failure. Zebrium can then find hotspots of abnormally correlated anomalous patterns automatically, giving operators exceptional insight into root causes of issues.
VMware Inc. also combines AIOps and cloud-native observability. VMware Tanzu Observability enables operators to troubleshoot across heterogeneous technology stacks with AI-driven root cause analytics.
The standout vendor bringing AI to the cloud-native observability story, however, is Carbon Relay. It offers automatic ML-powered optimization for Kubernetes applications.
In other words, Carbon Relay is proactive, since it continually assesses all relevant factors to determine the best set of deployment choices and then automatically implements them. It also recalculates on the fly to maintain top performance as conditions change.
It could be argued that Carbon Relay doesn’t really offer observability at all, as it focuses more on optimization. But given the fact that cloud-native observability includes empowering operators to fix issues, what better fix is there than a proactive optimization that prevents issues in the first place?
Theme No. 3: All the data, all the time
Operational telemetry has always been big data – all the logs, events and other streams of information coming off of every application and infrastructure component every second of every day.
Historically, processing and storing such vast quantities of information was cost prohibitive, so IT operations technologies had to work on samples – small subsets of all available data that statistically represented the behavior of the environment as a whole.
Today, the situation has worsened, as the number of data sources has exploded with the diversity of technologies and environments in the modern IT landscape. Combine this explosion with the fact that much of that technology is dynamic or even ephemeral, and sampling becomes increasingly impractical and ineffective.
Fortunately, the cost of storing and processing such data has also dropped, enabling some cloud-native observability vendors to process all available operational data, all the time, cost-effectively.
Epsagon Inc., for example, provides full visibility for containers, virtual machines, serverless functions and other elements of modern, cloud-native infrastructure. By leveraging all these data, it can automate detection, troubleshooting and resolution of issues with instant data correlation, payload visualization and full-depth tracing.
In fact, Epsagon automatically discovers the components of the entire applications stack, allowing operators to see performance metrics for any production resource across the cloud-native landscape automatically.
Sharing this automated discovery functionality is StackState B.V. It offers full-stack observability, with a single platform for on-premises, microservices and multicloud IT deployments – in other words, full hybrid IT support.
StackState automatically discovers the elements of the infrastructure, so it can track every change, essentially keeping track of the entire state of the enterprise topology with the ability to play back the state at any point in time, what the company calls “time travel.”
Splunk, VMware and New Relic also offer the ability to analyze all the operational data all the time. Splunk SignalFx provides a real-time streaming analytics engine that gives operators and developers the ability to monitor the entire infrastructure in seconds, not minutes. VMware also offers full-stack visibility across the full production environment including VMs, Kubernetes, serverless, cloud services and infrastructure.
New Relic lets operators search across all entities, including apps, hosts, containers, Kubernetes clusters, cloud services, databases and VMs. Because all the data are in one place, New Relic customers can see all the relationships and dependencies among their infrastructure entities and understand the context of those data within the cloud-native infrastructure. New Relic is also lowering its prices, making it more cost-effective to leverage all the data.
Making the right choice about cloud-native observability
The more established vendors in this article predictably have a broader, more complete offering than the startups. The startups, however, are driving innovation in their particular areas of focus.
More complete offerings may make more sense generally, but they may also be more difficult to implement and leverage to their full extent. The younger offerings may not have as many features, but their time to value is generally quicker than the large vendors.
As has always been true with IT operations tooling, organizations will never have just one tool. True, too many tools can also cause issues, but the best-run shops leverage a carefully selected set of complementary tools. Many things are different about cloud-native computing, but this basic fact will remain true for the foreseeable future.
Jason Bloomberg is founder and president of Intellyx, which publishes the Cloud-Native Computing Poster and advises business leaders and technology vendors on their digital transformation strategies. He wrote this article for SiliconANGLE. (* Disclosure: At the time of writing, New Relic is an Intellyx customer and VMware is a former Intellyx customer. None of the other organizations mentioned in this article is an Intellyx customer.)
Photo: Kate Ter Haar/Flickr
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU