UPDATED 19:50 EST / SEPTEMBER 30 2019

CLOUD

The quickening race to lead cloud-native computing ‘observability’

As enterprises seek to achieve the scalability and flexibility benefits of cloud-native computing using technologies such as microservices, containers and serverless computing, they quickly run into a wall: How do they ensure the resulting infrastructure and the apps running on it are performing properly?

Traditional monitoring technologies and approaches are simply not up to the task of providing sufficient visibility and control into inherently dynamic, ephemeral software assets. How can you monitor something that may appear one second, scale out the next and disappear seconds later?

The answer: Operations technology must move beyond visibility to a new principle of “observability” that shifts the responsibility for ensuring the performance of cloud-native infrastructure to the components of that infrastructure.

The birth of the cloud-native observability platform

This rise of observability as a core principle of cloud-native computing is far more than a terminology update. It has driven numerous open-source communities to create suitable instrumentation for ensuring cloud-native tech is observable.

The information technology operations vendor community has correspondingly scrambled to put together what they are now calling a cloud-native observability platform.

Market consolidation is the clearest harbinger of this scramble. In just the last couple of years, Splunk Inc. has acquired SignalFx, DataDog Inc. has picked up Logmatic.io, VMware Inc. acquired WaveFront, New Relic Inc. assembled acquisitions Opsmatic, CoScale and SignifAI, and SolarWinds Inc. has put together TraceView and Librato with its AppOptics product and integrated it with Loggly and Papertrail.

Meanwhile, the open-source community has been busy as well. “Cloud Native Computing Foundation projects like Prometheus for time-series metrics, Fluentd for log analysis and Jaeger for distributed tracing are popular open-source frameworks for cloud-native observability,” Deepak Jannu, director of product marketing at OpsRamp, explained for The New Stack.

The goal of all of these efforts is to provide the four pillars of observability: logging, metrics, tracing and alerting.

“If monitoring is about watching the state of the system over time, then observability is more broadly about gaining insight into why a system behaves in a certain way,” wrote Container Solutions Managing Director Ian Crosby, engineer Maarten Hoogendoorn and senior engineer Thijs Schnitger, along with Kogusenn and former Container Solutions Chief Engineer Etienne Tremel, in a white paper for The New Stack. “The cloud-native monitoring environment must provide insight into how a service’s state is related to the state of other resources. This, in turn, must point to the overall state of the system.”

Observability brings a new context to operations in cloud-native environments. “A cloud-native application is composed of independent microservices and required backing services. Even though a cloud-native application as a whole must remain available and continue to function, individual service instances will start or stop as to adjust for capacity requirements or to recover from failure,” explained an IBM Cloud Docs article. “Monitoring this fluid system requires each participant to be observable. Each entity must produce appropriate data to support automated problem detection and alerting, manual debugging when necessary, and analysis of system health (historical trends and analytics).”

Observability’s secret sauce

The observability of cloud-native components provide visibility to the operators responsible for them, but visibility alone isn’t the whole story. It’s also essential to empower ops personnel to interact with the information in order to drill down to the root causes of issues.

Leading observability vendors are in agreement on this point. “Modern observability goes beyond traditional monitoring, enabling the proactive introspection of distributed systems for greater operational visibility,” explained an eBook by VMware WaveFront. “Modern observability also allows you to ask open-ended questions about your operational data, get instantaneous results, so you can iterate and explore quickly to find answers.”

New Relic is on the same page. “A modern observability platform must excel at curation — cutting complexity down to size, and selecting and presenting relevant insights for its users,” Buddy Brewer, New Relic’s global vice president and general manager of client-side monitoring, and Alberto Gomez, New Relic’s senior director of product management, both at New Relic, wrote for Diginomica. “But a modern platform must also excel at supporting participation, making it easy for users to bring custom metrics and data sources into this process.”

Muddying the waters, though, is a lack of agreement on what to call this secret sauce. What New Relic calls “participation” others call “explorability” or “answerability.”

Regardless of the terminology, however, observability requires more than tooling. It also requires a change in operational culture. “Observability isn’t a substitute for monitoring; they’re complementary,” according to an eBook by Splunk. “But it’s nearly impossible to have effective monitoring without a culture of observability. Tools are not enough, and none are going to magically ‘give you’ observability.”

Bringing cloud-native observability to the enterprise

Cloud-native computing goes well beyond simply leveraging Kubernetes and containers in the cloud. In reality, it means extending cloud best practices to all of IT, including on-premises tech as well as the rest of the hybrid IT landscape.

Cloud-native observability is no different. Enterprises must move beyond familiar dashboard-based visibility to leverage the full breadth of observability, even for systems that are not intrinsically observable. Instrumenting existing IT assets for observability as part of an overall modernization strategy is therefore an essential element of enterprise cloud-native computing.

“If looking to take on the challenge of migrating your legacy apps into a new, cloud-native architecture, you need to give yourself the best chance possible at resolving the unforeseen errors that arise,” Simon Red, evangelist at RevDeBug, wrote in a blog. “That means investing in these advanced observability tools that work in cloud-native, microservice (and even serverless) environments.”

Such migration often means rearchitecting older apps by breaking them up and refactoring them as microservices – a process that can work at cross purposes to observability priorities.

“Breaking a monolithic application into microservices pushes more of the mainline path onto the network, increasing the impact of latency and other network issues,” the IBM Cloud Docs article continued. “Requests also reach processes that are not ready for work for any number of reasons. Services are automatically restarted if they run out of resources, and fault tolerance strategies allow the system as a whole to keep functioning. Manual intervention for individual failures is not useful or feasible in this kind of environment.”

In other words, modernizing monolithic applications both raises the bar on observability while simultaneously making it more difficult – a dangerous combination. “Splitting your applications apart into component pieces makes it far easier to edit, grow and build those applications, along with using the data generated for any number of purposes,” RevDeBug’s Red continued. “However, it makes it far harder to see what is going on, where faults lie and how each individual piece of your application is truly functioning.”

The programmable observability platform

If tooling alone cannot bridge the observability gap for the breath of enterprise cloud-native software assets, then what will? The answer is people. “While traditional monitoring tools can ensure the availability of monolithic applications on physical or virtual infrastructure, DevOps teams will need to combine metrics, logs, and traces for managing the health of ephemeral microservices built on containerized deployments,” OpsRamp’s Jannu added.

In order to accomplish this task, DevOps teams will need to bring their programming skills to bear. “A modern observability platform must take a full-stack, end-to-end approach,” New Relic’s Brewer and Gomez wrote. “Participation… relies on programmability — giving users the tools, and especially the APIs, to help them help themselves.”

In fact, at last week’s FutureStack Conference in New York City, New Relic emphasized the programmability of its cloud-native observability platform, New Relic One. “Observability requires a platform – with a capital P – open, connected, and programmable,” New Relic CEO Lew Cirne explained during his keynote. “A platform is something you write software on. Otherwise it’s a tool.”

New Relic One supports the React JavaScript library for building user interfaces – a simple programming environment that doesn’t require deep software development expertise.

“By making New Relic One a programmable platform, we’ve made it possible for you to build applications—custom user interfaces deployed in New Relic One — that connect your observability data, gathered from myriad sources — including third-party open source data — all in one place,” Greg Nika, senior director of product marketing at New Relic, explained in a blog.

The programmability New Relic built into New Relic One may give it a temporary edge in the cloud-native observability marketplace, but it’s clear the competition is hot on its heels. The coming battle is more likely going to be over the power, versatility and simplicity of such platforms’ programmability.

In the meantime, operations personnel should leverage today’s observability platforms to understand the “unknown unknowns” of modern production environments. “Modern observability gives you what you need to investigate all these unknown unknowns, that are the norm with complex distributed systems, like cloud-enabled microservices,” the VMware WaveFront eBook concluded.

My final advice to operations personnel who are struggling to make sense of their organizations’ road to cloud-native: Look beyond the dashboard. Although visibility is a one-way street, observability is interactive – and the more complex and automated production environments are, the more important such interactivity will become.

(Disclosure: IBM and New Relic are Intellyx customers, and OpsRamp and VMware are former Intellyx customers. None of the other organizations mentioned in this article is an Intellyx customer. New Relic covered my expenses at FutureStack, a common industry practice.)

Jason Bloomberg, is founder and president of the agile digital transformation analyst firm Intellyx, which advises companies on their digital transformation initiatives and helps suppliers communicate their agility stories. Bloomberg, who can be followed on Twitter and LinkedIn, is also the author or coauthor of four books, including “The Agile Architecture Revolution.”

Image: mainblick/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU