Opsani trains AI to optimize cloud infrastructure configuration and eliminate cost overruns

Instead of periodic hardware purchases that can be budgeted out over years, cloud usage is billed on an on-demand basis. Well-managed, this saves the company money. But finding the sweet spot for system configuration is a matter of trial an error.

The story usually goes like this: Software engineers don’t have a crystal ball to predict usage. Nor are they used to worrying about budget. So, they focus on keeping the systems stable and over-provision to avoid service tickets and midnight calls.

But, then in comes the chief financial officer angrily waving a bill for cloud costs and demanding usage cuts. Now under-provisioned, the system suffers downtime and lag, which impacts the user experience, which, in turn, trickles down to unhappy customers and lost revenue.

Searching for a better solution than trial and error is AWS ecosystem partner Datagrid Systems Inc. The company, which is doing business as Opsani, has created a platform that automates system configuration using machine learning and artificial intelligence to establish optimal provisioning levels.

“What we want to do is give engineers the tools to consume precisely the right amount of resources for the service-level objectives that they have,” said Amir Sharif (pictured), vice president of product and marketing at Opsani.

Sharif spoke with John Furrier, host of theCUBE, SiliconANGLE Media’s livestreaming studio, in advance of the AWS Startup Showcase: New Breakthroughs in DevOps, Analytics, and Cloud Management Tools event. They discussed how Opsani’s platform eliminates cloud cost overspend through continuous cloud optimization as-a-service. (* Disclosure below.)

Opsani reduces unexpected cloud cost over runs

When companies owned in-house data centers, rapid scale was impossible. On the plus side, costs were upfront and budgets predictable. Cloud flipped that on its head, as previously structured capital expenditure costs switched to variable operating expenses. APIs allow instant access to scale up as needed, but this freedom came without tools to predict and manage the cloud expenditure.

Opsani’s platform addresses this by training machine learning on the specific parameters required for optimal cost and performance. The platform is aimed at senior-level executives who are responsible for keeping costs down AND product quality high, according to Sharif.

“By giving the product owner the autonomous optimization tools that Opsani has, we allow him or her to deliver the right experience to the customer with the right sufficient resources and address both the performance and the cost side of equation simultaneously,” he said.

The platform also addresses the optimization issues that have come from agile development models where continuous integration/continuous delivery is the norm.

“We have a combination of GitOps where you can just pull down repositories, libraries, open-source projects from left and right, and using glue code, developers can deliver functionality really quick,” Sharif said, explaining how containerization, microservices and APIs have completely changed the software delivery cycle.

But one element that gets missed in the rush is quality control. Instead, refinements occur piecemeal over time as users complain about issues they encounter and the developers fix them.

“It typically goes through a 12-month cycle of maturation [to] get that system stability and the right performance,” Sharif stated.

Opsani platform to expand

The Opsani platform has already reduced the time required to perfect the balance of cloud usage costs and system stability, but the company is about to announce a new feature that will reduce the timeline to perfection considerably, according to Sharif. And in an upcoming announcement during KubeCon + CloudNative Con 2021, scheduled for October 13 to 16 in Los Angeles, Opsani is upping the ante to another level.

The company’s new product will give developers the ability to install Opsani in a Kubernetes environment in around 20 minutes, with results showing within two days.

“Because of CI/CD, you don’t have the luxury of waiting,” Sharif said, explaining how Opsani’s AI/ML continuous cloud optimization as-a-service can be part and parcel of the CI/CD pipeline that optimizes the code, giving ideal configuration from the start.

Opsani measures performance against SLO metrics

Opsani optimizes application performance by establishing each user’s required service-level objective metric against which performance can be measured.

“Given that you want a transaction rate of X and latency rate of Y, here’s how you configure your cloud infrastructure so the application delivers according to those SLOs with the least possible resources consumed,” Sharif explained.

In a Kubernetes environment, this would typically be Prometheus, Sharif added, as it is the metrics database for Kubernetes workloads. In this case, Opsani’s focus would be on three red metrics: the rate of transactions, the error rate, and D for delay or latency.

Opsani would inject a small open-source container, known as Servo, into the application workspace. Servo interacts with Prometheus to get the metrics and talks to Opsani’s backend to tell the machine learning engine what’s happening. The ML engine then performs the analysis and comes back with a new configuration, which Servo implements in a canary instance.

“So the canary instance is where we run our experiments, and we compare it against the mainline which the application is doing. After roughly 20 iterations or so, the ML engine learns what part of the problem space to focus on in order to deliver the optimal results,” Sharif said.

When the ML engine returns a set of solutions, it tests them inside the canary instance. Then, once it has established the optimal solution, it gives the recommendation back to the application team to implement. Once teams have established trust in Opsani, they can choose to bypass this final step, and auto-promote into the mainline.

“Our goal is, for our customers, to deliver the best user experience in terms of performance, reliability so that they delight their customers in return — and do so without breaking the bank,” he said. “So, deliver excellent products, do it at the most efficient way possible, and deliver good financial results for your stakeholders.”

Watch the complete video interview below, and be sure to check out SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase: New Breakthroughs in DevOps, Analytics, and Cloud Management Tools event on September 22. (* Disclosure: Datagrid Systems Inc. (dba Opsani) sponsored this segment of theCUBE. Neither Opsani nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.