UPDATED 17:00 EDT / MAY 12 2022

BIG DATA

A closer look at solving the Kubernetes complexity gap

The coming decade is set to be the season of the supercloud, according to enterprise technology analysts. Who will optimize it?

Businesses seem to be settling on hybrid models embracing on-premises data centers, multiple public clouds, and any number of edge devices. These are complex environments requiring constant management and fine-tuning. That can be problematic because sometimes that fine-tuning doesn’t happen. Systems designed to purr like a Maserati are choking like a farm truck with water in its diesel.

Optimizing cloud usage is the top concern across all organizations, according to the Flexera “2022 State of Cloud” report. The report also found the top challenges to cloud operations are security, lack of expertise and managing spend.

These three challenges are all impacted by poorly optimized cloud resources. And poor optimization doesn’t just affect application performance. Symptoms can include dangerous misconfigurations and costly over-provisioning of resources. The first creates opportunities for cybercriminals; the second wipes out the pay-as-you-go price savings of cloud and increases a company’s carbon footprint.

How do companies cross this cloud optimization gap?

The release of Optimize Live by StormForge, a flagship platform provider at Gramlabs Inc., looks to address these issues. Integrated with StormForge’s existing Optimize Pro Kubernetes solution, Optimize Live covers both pre-production and production environments in one platform. The goal is to provide proactive, continuous insights into their Kubernetes environments, claiming a first-mover advantage in the optimization market.

“StormForge is one of a number of newer startups disrupting the traditional platform services space for their role in easing configuration complexities within the infrastructure modernization and DevOps trend,” Charlotte Dunlap, principal analyst for application platforms, enterprise technology and services at GlobalData PLC., told theCUBE. 

In this article, theCUBE examines the impact of StormForge’s optimization portfolio and outlines the benefits of combining production and pre-production optimization. (* Disclosure below.)

The value of Day 2 Kubernetes application optimization

As companies move from the Day 1 excitement of cloud adoption into Day 2 operations, the dark side of cloud becomes apparent, characterized by the inability to effectively configure cloud resources. This leads to poor application performance, security issues and wasted cloud spend.

There are several reasons why cloud optimization is so hard. First, there’s just too much data. From two zettabytes of data generated in 2010 to 64.2 zettabytes in 2020, and an estimated 181 zettabytes in 2025, projections show the amount of data being created, copied, captured and consumed worldwide is on a steep growth curve with no sign of slowing. And why would it? As companies adopt cloud technology, they turn to a data-driven business model reliant on data insights. More data means better insights — if the company can manage that data properly. StormForge Optimize Live delivers this by leveraging the data already collected by a company’s existing observability solutions, providing AI-powered, actionable insights.

The second obstacle companies encounter when attempting to optimize their cloud-native environments is the skills gap. Kubernetes is complicated, and there is an acute shortage of engineers trained in the technology. As of writing, Indeed.com shows approximately 58,000 open positions for K8s-trained engineers. Alongside hiring new employees, companies are working to re-skill existing software engineers. According to a D2iQ Inc. survey, 98% of organizations have either already invested in K8s training or are planning to invest in K8s training.

But hiring and training can only go so far. D2iQ found that 38% of K8s developers and architects admit to feeling burned out, while 51% say working on cloud-native applications makes them want to quit their job.

The answer to reducing workload and increasing job satisfaction is to automate those processes that cause K8s headaches. StormForge solves this by introducing the “right kind of automation,” according to Chief Executive Officer Matt Provo. Rather than a hands-off scenario that takes developers out of the optimization process, StormForge “empowers developers,” Provo told theCUBE during the “Solving the Kubernetes Complexity Gap by Optimizing with Machine Learning” event.

The third, and most important, factor companies are encountering is complexity. This is due to containerization, which by its nature chops up applications and sends them into the opaque and ever-changing cloud-native environment. Once again, machine learning is the key to overcoming this problem.

Attempting to manually optimize in a K8s environment is an impossibility. With one K8s cluster it could be done, but attempting to configure thousands, if not hundreds of thousands, of clusters that are constantly spinning up and down is beyond human capability. This makes optimization a guessing game. As paying out to overprovision is preferable to dealing with downtime, engineers are more likely to allocate excess cloud resources. In fact, 48% of a company’s total cloud spend is wasted, according to StormForge’s “Cloud Waste Survey Findings.” This adds up to a total of $17.6 billion spent on unnecessary or idle cloud resources, according to 2020 research.

Optimize Live’s ML engines allow companies to optimize based on specific parameters, taking the guesswork out of K8s configuration, and other resource settings. According to the company’s statistics, customers saw an average of 54% reduction in cloud costs and a 45% improvement in system performance.

“[StormForge is] providing customers with the ability to bridge pre-production and post-production issues through the use of ML, which recommends real-time coding and configuration changes to infrastructure resources for improving application performance,” Dunlap stated in an intelligence report on the Optimize Live release.

Closing the data-to-value gap

StormForge, previously known as Carbon Relay, is known for its expertise in machine learning and AI. The company’s workforce is primarily data scientists, machine learning experts, and DevOps engineers who “focus on building real AI at the core that is connected to solving the right kind of actual business problems,” according to Provo. This emphasis on “real ML/AI” versus “marketing hype” is the differentiator for the company’s solutions, and frustration with “buzzword-y AI” was one of the reasons Provo formed the company in the first place.

Both Optimize Pro and Optimize Live use ML to provide application insights within a Kubernetes environment. Optimize Pro scans K8s clusters within a pre-production environment, detecting configurable parameters and optimizing them based on the company’s custom optimization objectives for cost, latency, throughput, error rate and duration. This is a five-step process that starts with the StormForge controller running a base set of parameters. Then the controller runs a performance test, applying a realistic load to the system. Outcomes are measured against the custom goals, and results are analyzed. This is repeated to allow the ML to build a complete picture and establish the ideal set of parameters to meet the company’s goals.

“Resource optimization — sometimes referred to as ‘cloud rightsizing’ — is a premise that has been around for quite some time, with varying levels of traction and adoption,” said James Sanders, research analyst for 451 Research, part of S&P Global Market Intelligence. “From a product side, doing this for virtual machines can be achieved with some light automation, monitoring and simple heuristics. Rightsizing for Kubernetes clusters is quite a bit more involved, as is often the case with Kubernetes, so an ML-informed approach could aid in addressing that complexity.”

But for DevOps teams, responsibility doesn’t end when an app is pushed out into production. So StormForge developed Optimize Live, building onto Optimize Pro’s proven pre-production optimization to offer optimization across both production and pre-production environments. The major difference between the two solutions is that while Optimize Pro uses the experimentation approach to optimization in a non-production environment, as described above, Optimize Live operates in production environments using observability data pulled from a customer’s existing observability solutions.

This may seem a strange decision, as in many instances the observability solutions StormForge is integrating with could also be considered its competitors. But this isn’t the case, according to Provo, who told theCUBE that he considers StormForge akin to the “Intel Inside” of the observability market.

“We don’t want organizations or users to have to switch from tools and investments that they’ve already made,” he said. “We were never going to catch up to Datadog or Dynatrace or Splunk or AppDynamics … and we’re totally fine with that.”

This is a smart choice, according to Sanders. “Trying to displace incumbent vendors is a challenge — working alongside and integrating with the observability tools already in place is an easier route to market,” he told theCUBE.

Intelligent observability

Through intelligently analyzing observability data in a production environment, Optimize Live is continuously tweaking CPU and memory settings, optimizing application performance and cloud usage. The solution’s simple configuration means users don’t need in-depth K8s training, and companies experience fast time-to-value from their investment.

“StormForge Optimize Live merges advanced machine learning with observability tools to provide DevOps teams with real-time configuration recommendations to strengthen operational efficiency. The platform recommends real-time coding and configuration changes to infrastructure resources for improving application performance,” Dunlap told theCUBE.

Other differences between the solutions are that while Optimize Pro is best used for complex, mission-critical Kubernetes applications and optimizes for a wide range of scenarios using load testing, Optimize Live can be used for all Kubernetes applications and optimizes based on actual utilization and observability data. A full comparison can be seen in the table below from StormForge.

  Optimize Pro Optimize Live
Optimization approach Experimentation-based using machine learning Observation-based using machine learning
Best used for Complex, mission-critical Kubernetes applications All Kubernetes applications
Environment Non-prod Prod
Data input Optimizes for wide range of scenarios using load testing Optimizes based on actual utilization and observability data
Goals and parameters Optimizes for any goal by tuning any parameter Recommends CPU and memory requests & limits for improved efficiency
Adoption of recommendations Presents set of recommended configurations, user chooses based on business trade-offs Recommendations at chosen frequency can be automatically implemented or manually approved
Intangibles Provides deep application insights to drive architectural improvements Simple configuration, fast time-to-value

While StormForge’s platform does not provide any intelligence technology to manage telemetry data, something that many observability competitors consider basic functionality, thanks to its open-source stance, the platform can be integrated with other technologies from members of the cloud-native community. In addition, while StormForge is known for optimization, it has a low profile within the wider observability arena, positioning the newcomer to define its value proposition within a highly fragmented and fast-evolving market.

Watch StormForge’s announcement of OptimizeLive during the Solving the Kubernetes Complexity Gap by Optimizing With Machine Learning” event, previously broadcast on theCUBE.

(* Disclosure: TheCUBE is a paid media partner for the “Solving the Kubernetes Complexity Gap by Optimizing With Machine Learning” event. Neither StormForge, the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Image: Getty Images

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU