UPDATED 15:36 EST / FEBRUARY 23 2022

INFRA

How StormForge employs ‘real ML’ to optimize Kubernetes in production

In the past decade, cloud-native applications have gone from fringe to fundamental technology for business operations. This shift from legacy infrastructure has accelerated thanks to edge computing and demand for real-time data insights, not to mention the sprawl of remote operations prompted by the pandemic.

Yet the rapid adoption of cloud-native technology has created complications for businesses. The added technological intricacies of managing hybrid clouds coupled with a lack of skilled engineers will ultimately uncover operational inefficiencies. And as industry standards build upon the scaffolding of open-source container orchestration platform Kubernetes, the complexities are especially abundant. One Kubernetes specialist thinks machine learning can help.

“Once [companies] start to operationalize the use of Kubernetes and move workloads from pre-production into production, they run into a pretty significant complexity wall,” said Matt Provo (pictured), founder and chief executive officer of intelligent optimization platform provider StormForge, a flagship platform provider at Gramlabs Inc.

Provo spoke with Dave Vellante, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during StormForge’s “Solving the Kubernetes Complexity Gap by Optimizing With Machine Learning” event. In a separate session, Vellante spoke with Charley Dublin, vice president of product management at Acquia Inc., on how StormForge’s platform helps Acquia manage Kubernetes complexity. (* Disclosure below.)

Machine learning: more than a buzzword

Research into Kubernetes use in enterprise estimates that 89% of organizations now use the container orchestration tool in production or pre-production environments, with 77% naming it as central to their digital transformation strategy. However, the research also found that almost all (94%) of the organizations using Kubernetes are experiencing challenges. High on the list of problems is that Kubernetes is notoriously hard to fine-tune. With mission-critical workloads at stake, engineers err on the side of caution and run up large bills provisioning cloud resources they don’t need.

Machine learning is the answer to scaling that cloud-native complexity wall, according to StormForge. The company announced the release of Optimize Live during the event, marking a significant shift in its platform’s capabilities.

Optimize Live gathers observability data that companies are already collecting to enable intelligent optimization within production environments, according to Provo. This meets the needs of the increasing number of enterprises deploying Kubernetes in production and adds to StormForge’s existing solution, Optimize Pro, which improves efficiency within pre-production environments.

“My vision has been for us to be able to close the loop between data coming out of pre-production and the associated optimizations and data coming out of a production environment and our ability to optimize that,” he said.

Provo emphasized that machine learning, as integrated in Optimize Live, is “real machine learning, not machine learning as a marketing tag.” StormForge started out as a lab focused on building real artificial intelligence that connected to solving real business problems and has several staff members with applied math Ph.D.s who work on machine learning.

The actual use-case application to solving Kubernetes complexity came after the team had been working together for nearly four years, according to Provo.

“We were trying to connect a fantastic team with differentiated technology to the right market timing. And when we saw all of these pain points around how fast the adoption of containers and Kubernetes have taken place. This was the perfect use case,” he said.

“World-class” is the word Dublin uses to describe StormForge’s machine-learning talent.

“I’ve run machine-learning teams, data-science teams and would put them in the top 1% of any team that I’ve worked with in terms of their expertise,” he said.

Building on cooperation and interaction

Rather than creating a product to replace existing technology, StormForge chose to focus on using machine intelligence to solve business problems and to partner or integrate with companies that could theoretically be seen as competitors.

“We don’t want organizations or users to have to switch from tools and investments that they’ve already made,” Provo said, describing StormForge as the “Intel Inside” for the application performance monitoring market. “We were never going to catch up to Datadog or Dynatrace or Splunk or AppDynamics … and we’re totally fine with that.”

The company’s biggest competitor for Observe Live is an out-of-the-box tool shipped with Kubernetes, called the Vertical Pod Autoscaler, according to Provo. But that competition isn’t exactly tough. Less than 1% of Kubernetes users take advantage of this tool because it’s both challenging to configure and lacks compatibility with the ecosystem of tools in a Kubernetes environment, Provo added.

This means developers are flying blind when they make decisions on different metrics or resource elements such as CPU and memory allocations.

“They have to decide ‘what are the requests I’m going to allow for this application and what are the limits? So, what are those thresholds that I’m going to be OK with so that I can again try to hit my business objectives and keep in line with my SLAs?’ And it’s often guesswork,” Provo said.

Optimize Live removes the uncertainty by adding what Provo refers to as an “observation phase,” where engineers can run checks and balances between pre-production and production environments to make sure that actual application performance and deployment levels are in line with service-level objectives and agreements, as well as business objectives. But StormForge is wary of making its solution too hands-off.

Optimize Live continuously and consistently observes the data flowing through Kubernetes tools and serves recommendations back to the user, who can then allow automatic patch and deploy or decide to manually deploy into the environment themselves.

“The right kind of automation that empowers developers into the process ultimately does not automate them out of it,” Provo said.

Here’s the complete interview with Provo:

Use case for real-time policy decisions

From the customer’s point of view, StormForge’s ability to automate Kubernetes management solves the puzzle of whether to choose performance or functionality by prioritizing both.

“We leverage StormForge to enable us to right-size applications for performance, provide us cost benefits, allocate what you need when you need it for our customers,” Dublin said.

Acquia is a leader in Drupal hosting, ranking just behind Adobe Systems Inc. on the 2021 Gartner magic quadrant for digital experience platforms. The company is a large consumer of Amazon Web Services Inc. services and is currently undergoing a major re-platforming away from “legacy” AWS toward Kubernetes and containers, according to Dublin.

“We support customers leveraging Drupal in every industry … and so what you have is a very wide range of applications and consumer and consumption models,” Dublin explained. “StormForge’s capability in conjunction with Kubernetes and containers really puts us in a position where customers are able to get the performance that they want when they need it on demand.”

Initially, Acquia attempted to do elements of its cloud-native application management internally. But, as the numbers ballooned from tens to hundreds to tens of thousands of applications, a solution that incorporated machine learning and integrated with AWS was essential,  according to Dublin. That solution was StormForge.

The company has been involved in beta-testing for Optimize Live and has come to rely on its recommendations for decision-making and platform management.

“Optimize Live allows us in real time to make policy decisions across our fleet on what’s the right tradeoff between performance cost, other parameters,” Dublin said. “Without StormForge, we’d have to do massive data aggregation. We’d have to have machine learning and additional infrastructure to manage to derive this information. That is not our core business. We don’t want to be doing that.”

Here’s the complete interview with Dublin:

Alongside machine learning for continuous optimization of production environments, Optimize Live builds on the foundation of the StormForge Optimize Pro platform to analyze observability data and recommend CPU, memory and replica resource settings; learn the customer’s specific environment and offer custom recommendations; and provide configurable policies for increased flexibility. Together, Optimize Pro and Optimize Live create the first platform that combines pre-production and production optimization.

“Kubernetes is storming the castle, and there’s no stopping it,” Vellante said. “StormForge and a host of companies are stepping up to help customers take advantage of this wave by delivering technologies that help predict and manage customer experiences and accelerate innovation.”

You can watch the entire StormForge “Solving the Kubernetes Complexity Gap by Optimizing With Machine Learning” event on theCUBE’s dedicated event channel. (* Disclosure: TheCUBE is a paid media partner for the “Solving the Kubernetes Complexity Gap by Optimizing With Machine Learning” event. Neither StormForge, the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Image: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU