UPDATED 12:00 EDT / MARCH 08 2021

CLOUD

Observability moves to the center of the distributed systems conversation

In an age of distributed systems, cloud native architectures, and workloads, the act of “monitoring” has morphed into “observability,” yet both remain important as enterprises manage complex hybrid and multicloud infrastructure.

The distinction between monitoring and observability can be tricky to pin down. But the two terms are increasingly being differentiated by monitoring solutions for traditional system architectures without total visibility and observability for an end-to-end systems view that establishes a level of cause rather than just effect. Observability for enterprise systems means understanding not just “what,” but “why.”

The need for observability has become particularly important because today’s distributed systems and cloud native technologies, such as Kubernetes, microservices and serverless functions, have created a complex IT infrastructure that has outstripped the ability of traditional monitoring tools.

“It matters for people to know what is wrong,” said Dinesh Dutt, principal and co-founder of Stardust Systems Inc. and a former Fellow at Cisco Systems Inc., in an interview with SiliconANGLE. “I want to observe and understand the system for more than just fault monitoring and performance. There are lots of failures that people don’t understand, and observability is trying to address all of that.”

View of user experience

Achieving a level of observability that provides a complete picture will depend heavily on telemetry data. This is information that encompasses metrics, code-generated events, system logs and traces that reveal isolated activity for a single transaction as it hops across microservices.

This level of detail highlights an important component of observability that makes it different from monitoring. While system administrators rely on monitoring to track performance from a single device standpoint, observability provides a way to better gauge the customer impact. It’s seeing the end-to-end system based on the user experience

While a monitoring tool might tell an IT administrator that random access memory usage is suddenly high, an observability tool might connect the dots between that issue and a memory leak through the use of APIs to collect data from a wide range of sources.

Surfacing golden signals allows developers, DevOps and SRE teams to better monitor, troubleshoot and investigate for known and unknown issues that’s impacting their system behavior.

“Observability is about, ‘what can we learn, how we can proactively respond to what we are seeing and what the customer is experiencing?’” said Joseph Ours, director of modern software delivery at Centric Consulting, during a recent interview.

Recent moves by several key players in the observability space highlight this trend, starting with Amazon Web Services Inc. The cloud giant launched a set of new observability tools in December designed to make it easier for enterprises to keep tabs on cloud infrastructure health. These included insight into technical information on how customer-facing workloads are running, and analyzing operational data on containerized applications.

Oracle Corp. also launchednew observability platform in mid-October for managing multicloud and on-premises deployments, using machine learning to identify anomalous systems behavior before it impacts the user by isolating and remediating performance issues. That was followed by an announcement from Splunk Inc. that it would introduce its own Observability Suite, designed to provide full, scalable ingest of metric, trace and log data, along with the use of artificial intelligence and machine learning analytics that point toward the cause of problems during incidents.

Other market developments include Sumo Logic Inc.’s recent updates to its observability offering to extended metrics and tracing tools for Kubernetes and CrowdStrike Holding Inc.’s plans to acquire Humio, a log analysis and observibility startup backed by Dell Technologies Inc.

Support for DevOps

The integration of AI into these most recent industry offerings provides a key component of the observability movement. The ultimate goal is to learn what is happening inside systems and avoid outages in the future. This becomes more important for the DevOps community as infrastructure failures can overwhelm operators seeking to prioritize critical issues in the system and developers need to understand whether their applications will function as intended.

“[Alerts] flood the console and make it impossible for tier one operators to figure out what’s going on,” said Gregg Siegfried, research director on the Cloud and IT Operations team at Gartner Inc., in an interview with SiliconANGLE. “The AI use cases to date have been assistance in monitoring a flood of events.”

Observability represents an opportunity for engineering teams to rethink monitoring strategies by observing software performance at the source. Developers can build scripts that take action on clusters based on the larger amount of data that observability platforms can now provide.

At the source, developers uncover new data wellsprings, such as network performance and reliability. It’s all thanks to a technology called an extended Berkeley Packet Filter (eBPF), building directly off the Linux Kernel for real-time visibility into the interaction between applications, networks and other infrastructure elements. Without having to make changes in application code or container images, DevOps teams are granted a complete view across IT environments, making eBPF an attractive investment. 

In November, Splunk acquired Flowmill, a network observibility service well-versed in eBPF technologies, followed weeks later with news of New Relic’s acquisition of Pixie Labs for its Kubernetes-monitoring platform. With its integration into the DevOps culture, observability could move rapidly toward broader assimilation into enterprise IT environments. Observability is being driven by data and the science that ultimately provide solutions to manage complex distributed computing environments.

“Observability is essentially instrumentation of the traffic and looking at all the data to make sure systems run effectively, but it’s distributed computing at the end of it, so there’s a lot of science that’s been there, and now new science emerging around it,” said John Furrier, chief executive officer and founding editor of SiliconANGLE Media. “This becomes a key part of the architectural choices that some companies have to make if they want to be in position to take advantage of cloud native growth. These technical decisions matter.”

Make sure to check out theCUBE’s coverage of a special four-part CUBE Conversation series on the importance of observability and how Splunk’s history of big data analysis is helping the company address today’s monitoring challenges.

Image: Stunning Art

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.