UPDATED 19:20 EST / JANUARY 21 2021


Facing down the data onslaught with stateful architecture

The age of big data taught us that there is a timeline to data management: Store the data, analyze it, then model predictions.

Unfortunately, that time-consuming process just doesn’t cut it with the staggering amounts of data currently being generated. Petabytes of data flow from edge devices daily, and the amount is growing rapidly thanks to the demand for ever more connections.

“The data onslaught is very real,” said Simon Crosby (pictured), chief technology officer at Swim.ai Inc. “Companies are facing more and more real-time data from products from their infrastructure, from their partners. They need to make decisions rapidly, and the problem is that traditional ways of processing that data are too slow.”

The solution is to adopt a process of data analysis on the fly, according to Crosby. “You need to analyze [data] as you receive it and react immediately to be able to generate reasonable insights or predictions that can drive commerce and decisions in the real world,” he said.

Crosby spoke with Stu Miniman, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during theCUBE on Cloud event. They discussed the future of data analysis and how architectures are evolving for real-time processing in-memory.

Getting faster at the edge

The data onslaught bombarding organizations is mostly thanks to the proliferation of new products with built-in CPUs, otherwise known as edge devices. According to Crosby, “the right way to think about edge is where can you reasonably process the data. Edge as a place doesn’t make as much sense as edge as an opportunity to decrypt and analyze data in the clear.” The edge, for Crosby, is often the cloud.

Cloud computing has taken advantage of two major abstractions: “REST, which is static disk computing, and databases,” Crosby said. “REST means any old server can do the job for me. Then the database is just an API call away.”

There’s just one problem: With CPUs speeds clocked in gigahertz and the network in milliseconds, connecting to a data store means a (relatively) interminable wait. “You’re going a million times slower than your CPU,” Crosby said. “That’s terrible. It’s absolutely tragic.”

Dumping cloud for an in-memory model with stateful computation solves that. Instead of having to connect externally going back and forth to store or retrieve data, compute is done as data arrives. “You get a million times speed up,” he explained. “You also end up with this tremendous cost reduction because you don’t end up with as many instances having to compute.”

Let data build the model

A real-life example comes from traffic light data in Palo Alto, California. The city generates about 4 terabytes of data a day from just a few hundred lights. Although that can theoretically be handled with a serverless compute service, such as Amazon Web Services Inc.’s Lambda, “the problem is that the end-to-end per event latency is about 100 milliseconds,” Crosby said.

And with upwards of 30,000 events a second, “that’s just too much.” Solving the problem with stateless architecture would be “extraordinarily expensive,” Crosby said, estimating costs of “more than $5,000 a month.”

Beyond the Palo Alto scenario, the volumes of raw data generated are “staggering,” according to Crosby. A similar traffic monitoring system in Las Vegas generates about 60 terabytes a day, and just one mobile provider can deal with real-time data from hundreds of millions of mobile devices.

“There is simply no way you can ever store that and analyze it later,” he said. “So cloud is fabulous for things that need to scale wide, but a stateful model is required for dealing with things which update you rapidly or regularly about their changes in state.”

One obstacle in the proliferation of edge computing is the lack of skilled data scientists and engineers to train the algorithms and deploy them at the edge. To eliminate this, Crosby offers an alternative worldview where uncomplicated algorithms are deployed at scale to stateful representatives.

“The way this edge world gets smarter is that relatively simple models of things will learn for themselves and create their own futures based on what they can see and then react,” Crosby said.

Another benefit is that developers don’t need specialized, cloud-native skillsets before they can get to work. Instead of worrying about database locations, developers can write simple object-oriented programs that relate basic objects in the world to each other, according to Crosby.

“Then we let data build the model by essentially creating these little concurrent objects for each thing, and they will then link to each other and solve the problem,” he said. “If you adopt a stateful computing architecture … you get to go a million times faster. The applications always have an answer. They analyze, learn and predict on the fly, and they go a million times faster. They use 10% of the infrastructure of a store than the analyze approach. And it’s the way of the future.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of theCUBE on Cloud event:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.