UPDATED 12:00 EDT / FEBRUARY 22 2021

CLOUD

Splunk clears the ‘complexity fog’ to bring observability from cloud to device

The benefits of adopting cloud are indisputable, but the resulting architectural complexity has obscured the vision of IT professionals. What was on-premises network monitoring has been magnified into a requirement for cloud-native observability across an organization’s information technology stack, from the data center to the edge.

“Where an app used to be run off three different tiers in a data center, now it could be across hundreds of machines and opaque networks, opaque data centers all over the world, and often the only time you see how things come together is on the user’s desktop,” said Craig Hyde, senior director of product management at Splunk Inc.

Hyde and Splunk colleagues Arijit Mukherji, distinguished architect; Mike Cohen, head of product management, network monitoring; and Patrick Lin, (pictured), vice president of product management, observability, joined John Furrier, host of theCUBE, SiliconANGLE Media’s livestreaming studio, for a special four-part CUBE Conversation on the importance of observability, and how Splunk’s history of big data analysis is helping the company address today’s monitoring challenges. (* Disclosure below.)

View the entire CUBE Conversation with Craig Hyde here:

Keeping watch over microservices and containers

The trend for containers and microservices has been driven by the need for speed and scalability in the development pipeline. But cloud architecture is nebulous.

“Rather than have an end to your app that you’re watching over on some hosts that you could reboot when there’s a problem, now you have tens, maybe hundreds of services running on top of maybe hundreds, thousands, maybe tens of thousands of containers,” Lin said.

This provides several problems from an observability standpoint: “One is, you need to be tracking this in enough detail and at a high enough resolution in real time so that you know when things are coming in and out,” Lin stated, referring to the act of containers spinning up or down. “But just as important is understanding the dependencies and the relationships between these different services.”

There are plenty of tools out there to accomplish this. Too many, according to Lin.

“I’ve been looking at some of the toolsets that some of our customers have pulled together, and they have the ability to get information about everything, but it’s not woven together in a useful way,” he added. 

An integrated toolset is needed to combat this tool sprawl, and it needs to be a purpose-built, real-time solution, according to Lin.

“It’s hard to retrofit a system,” he said. “You need to start from the very beginning … you need some form of a real-time streaming architecture; something that’s capable of providing that real-time detection and alerting across a very wide range of things in order to handle the scale and the ephemeral nature of cloud environments.”

This is the goal of the Splunk Observability Suitereleased as part of the company’s Data-to-Everything Platform during the Splunk .conf20 event in October 2020. The integrated solution provides a single, consistent user experience across metrics, logs, and traces, providing seamless monitoring, troubleshooting, and investigation, Lin explained.

“I’d say we have the industry’s most comprehensive and powerful combination of solutions that will help both sorts of IT and developer operations teams tackle these new challenges for monitoring and observability that other tools simply can’t address,” he added.

The 5 Foundational DevOps Practices” report from Splunk, which draws from over 3,000 participants, reveals what separates successful DevOps teams from those that fail, outlining the importance of true end-to-end visibility and recommendations for achieving it.

View the entire CUBE Conversation with Patrick Lin here:

Under the hood: Splunk Observability

Today, companies need observability to be able to monitor and manage application performance, infrastructure, logging, real user activity, and digital experience. But tomorrow will bring new challenges.

“Technologies and infrastructures will keep on changing; that’s sort of the rule of nature right now. The question is, how do we best address it in a more future-proofed system?” Mukherji asked.

Speaking with Furrier, Mukherji described how Splunk’s architects approached the technical challenge of creating a comprehensive and integrated observability solution. The main thing companies need to do is establish what they require from an observability solution, according to Mukherji. Observability is not “just a set of parts,” he said, “but it brings direct product benefits, like faster mean time to resolution, understanding what’s going on in your environment,” having fewer outages at the same time and understanding root causes.

Full-fidelity — understanding every single transaction — is a “fascinating superpower” according to Mukherji, because that’s where you can avoid “the gaps, and if you are able to go back and track any bad transaction, any time, that is hugely liberating,” he said.

The Splunk Observability Suite has what the company’s dubbed NoSample full-fidelity trace ingestion as “a core foundational principle,” Mukherji stated. “For us, it’s not just isolated to application performance management where a user gets your API and you’re able to track what happened. We are taking this upstream up to the user, where the user is taking actions on the browser,” he furthered, as understanding the whole user transaction end-to-end, without any gaps, without any sampling, is extremely powerful.

Another huge issue Splunk addresses are the inefficiencies of tool sprawl.

“If you find yourself using three or four different tools which are all part of some critical workload together … something could be optimized,” Mukherji said.

Integrating tools into one user interface that gives cross-tool data on incident management, infrastructure monitoring and incident management, for example, allows engineers to make quicker, faster decisions and avert or control crises.

View the entire CUBE Conversation with Arijit Mukherji here:

Achieving network observability for distributed services

The network is a common scapegoat for public cloud problems thanks to the increasing opacity of network infrastructure in the cloud. While the network is sometimes to blame, equally as often there’s another cause for the issue.

“You need to understand where these problems are occurring to have the right level of visibility in your systems,” said Cohen during a CUBE Conversation that gets into the nitty-gritty of observability at the network level.

Rather than the culprit for outages, the network is “an untapped resource” for site reliability engineers struggling to understand the complex environments created by distributed systems, according to Cohen. Next-level network performance monitoring technologies, such as extended Berkeley Packet Filter, stylized as eBPF, and OS-level monitoring are giving visibility into how processes and containers communicate.

“Network is a powerful new data set that we can combine with other aspects of what people have already been doing in observability,” Cohen stated.

eBPF (which is integrated into the Linux operating system) gives the ability to visualize and optimize a service architecture. This is a huge step toward clarifying the complexities of distributed systems.

“It gives you an interesting touchpoint to observe the behavior of every processing container automatically,” Cohen said. “You can see with very little overhead what they’re doing and correlate that with data from systems like Kubernetes to understand how distributed systems behave [and] to see how things connect to two other things.”

The Splunk Observability Suite takes this to another level, automatically building a complete service map of the system in seconds without developer input, according to Cohen.

“Without forcing anyone to change their code, they can get visibility across an entire system automatically,” he said.

This visibility enables not only proactive problem identification and resolution, but the ability to optimize the system and lower costs. Which turns the network “from a liability to a strength in these distributed environments,” Cohen stated.

Gartner’s “Innovation Insight for Observability” report outlines the importance of true end-to-end visibility and recommendations for achieving it. The report’s findings underscore the importance of an open-source solution and approach, applying pragmatic observability to digital business, and increasing application uptime by design.

View the entire CUBE Conversation with Mike Cohen here:

User experience-driven, end-to-end observability

The digital transformation genie is out of the bottle, and there’s no putting it back now, according to Hyde. He defines observability in a broader context than “just machine data or network data,” arguing that it is “where you can see everything that’s going on inside the application and the digital user experience.”

Advocating the “work backwards” method, Hyde recommends starting with the end-user experience as a yardstick to work toward.

“Availability on a server or CPU time or transaction time in a database, those are all great, but without the context of what is the goal you’re going after, it’s kind of useless,” he stated.

Splunk’s “hierarchy of monitoring needs” has three layers, which Hyde describes as starting with the simple but table stakes: Check if the machine is up and running. Next up the scale: Are the applications running on that machine? “How they’re talking to each other; are other components that you’re making API calls to, are they timing out or are they breaking things?” Hyde stated, describing the need to gain visibility at the container and microservices architecture level. 

The cherry on top is the third layer, which addresses how the entire stack of technology is serving the end user. “What is the experience?” Hyde asked.

Splunk’s end vision of unlocking the power in data hasn’t changed since the company’s start back in the early 2000s, according to Furrier. It has just evolved to deal with the increased complexity of cloud services and cloud-native architectures.

Splunk is fully committed to going “not only broad to get everything under one roof, but also deep so that you can make all of the information that you collect actionable and useful,” Hyde said. “It’s an 800-pound gorilla in machine data and taking in data at scale.”

Be sure to check out more of SiliconANGLE’s and theCUBE’s CUBE Conversations(* Disclosure: Splunk Inc. sponsored this CUBE Conversation. Neither Splunk nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU