UPDATED 18:30 EDT / FEBRUARY 22 2018

BIG DATA

The software convergence race to a single AI throat to choke

Big data analytics projects are flopping at a rate of 85 percent, according to some analysts. The culprit might be the heterogeneous heap of software technologies perplexing companies desperate for a simple, streamlined solution. Can we expect a valiant vendor to step forward and boil the best of the bunch down to a winning formula?

“The thing that’s going to determine the rate of change and the degree of convergence is going to be how we deal with data,” said Peter Burris (@plburris, pictured, left), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, and chief research officer and general manager of Wikibon Inc. (SiliconANGLE Media Inc. is the parent company of Wikibon Inc.). 

The everlasting challenge of lugging data from one location to another within the allotted latency budget is still with us in the big data age, Burris explained. It is first necessary to route the quickest path for data from ingestion through training (for machine learning and artificial intelligence models) and out to end points and applications. Converged software in the form of analytics pipelines and the like can then emerge around this path.

Burris read the progress bar on big data analytics and AI software convergence in a discussion with fellow Wikibon analysts George Gilbert (@ggilbert41, pictured, center) and David Floyer (@dfloyer, pictured, right) at theCUBE’s studio in Palo Alto, California. 

Balancing the latency budget

Predictive analytics models are only as good as the data that feed them. Funneling in relevant data scattered within an enterprise and without it in a timely manner can be difficult. “You will want to — in the best possible way — combine that data one way or another,” Floyer said.

Combining data points for the purpose of training machine learning or AI models has a radically different latency budget than does combining them for inferences at end points, according to Floyer. Collating and training data is still a batch process and can be done in one central location — be it the cloud or an on-premise data center. It could take days, weeks or months and requires mechanisms that pull together spread-out data, he added. 

On the other hand, on-the-spot inference — whether in an application or an “internet of things” connected device — can’t wait on data from all corners of the universe. For this reason, inference engines at end points must be packed with predictive models or prefab information that boosts their intelligence, Gilbert pointed out.

“No matter what kind of answers you’re looking for, some of the attributes are going to be pre-computed,” Gilbert said. “And you’re not going to calculate everything in real time.” 

If all of this sounds like a heavy load of complexity, that is because it is — at least for the time being. Certain distro (Linux  distribution) companies have stated that more than 50 percent of their development budget goes toward integrating the necessary pieces, according to Burris. “That’s a nonstarter for a lot of enterprises,” he said.

The UniGrid complexity compressor

Thankfully, software vendors and cloud infrastructure providers are at work corralling the sprawl. Flash storage lifted long-standing constraints on swiftly accessing data, but there remains room for further improvement. The Wikibon research team has described an emerging architecture that it labeled the “UniGrid”; it will go far toward dissolving the remaining constraints, Floyer explained.

Why UniGrid? Well, ‘grid’ because these future systems will be organized as grids that effectively share all computing resources in the system,” Burris wrote in a SiliconANGLE article last June. As for the “uni”? The architectures will be “universal to business people,” “uniform to developers,” and “unitary to system administrators,” he wrote. 

UniGrid architecture offloads the networking and storage from the processor, while allowing any processor to access any data much more fluidly, Floyer explained. The hyperscale cloud providers are already putting a great deal of effort into that. “That type of architecture gives us the ability to converge the traditional systems of record — there are a lot of them, obviously — and the systems of engagement and the real-time analytics for the first time.”

The goal of this type of architecture and of data pipelines is to get as much information into the analysis while staying within the latency budget, Gilbert explained. “The more data that goes into making the inference, the better the inference,” he said.

Obviously, developers and systems administrators aren’t octopuses, so the fewer hands they need to pull it all in, the better, as well. System admins would benefit the most, since they have two or three times as many constructs to deal with, Gilbert stated.

“If you’re dealing with one product, it’s a huge burden off the admin, and we know they struggled with [Apache Hadoop big data framework],” he said.

The single product or platform grail still alludes us for now. So the good money for enterprises that want to apply big data, machine learning and AI analytics is on products that compress and automate as many steps as possible, according to Floyer. And woe to the enterprise that thinks it can develop its own in-house ML, he warned. It would be much wiser to outsource that job to vendors.

“In the same way that there’s a lot of inference engines, which will be out at the IoT level — those will have very rapid analytics given to them, again, not by yourself, but by companies that specialize in facial recognition or specialize in data warehousing,” Floyer concluded. 

Watch the full video below:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU