UPDATED 20:30 EST / AUGUST 01 2019

BIG DATA

Cataloging is the data platform of the future, says IDC analyst

Algorithms that read customers’ minds and predict a market’s future are the sirens alluring today’s companies to big data. But before this is possible, there’s some unglamorous work to be done excavating the raw resources necessary. Finding out where data is and preparing it for prime time is critical first step. Pre-existing silos and multicloud can give companies a lot of disparate spaces to scavenge through.

The most sensible place to start may be with the available data about all that data — or metadata, according to Stewart Bond (pictured), research director at IDC Research Inc. “That’s why we’ve seen such a jump in the number of vendors that are providing data cataloging solutions,” he said.

Bond spoke with Dave Vellante (@dvellante) and Paul Gillin (@pgillin), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the MIT CDOIQ Symposium in Cambridge, Massachusetts. They discussed data cataloging as the best hope for handling big data in multicloud (see the full interview with transcript here).

Spider legs go farthest in multicloud

All types of data initiatives — from monetization to artificial intelligence to governance — require some way to find, label and organize massive data sets. Companies are realizing that poorly cleansed or inaccurately labelled data are resulting in inaccurate insights. And vendors are rushing to the rescue. The number of vendors offering cataloging solutions has increased about 240% in the last year and a half, according to Bond’s research.

Selecting data to train models to make accurate predictions is challenging even when it’s all in one place. Vendors are trying to manage all of the dispersed data enterprises want to analyze in a number of ways. There is software for data integration, data intelligence, data profiling, etc. It is the “spidering” of data cataloging that has the most promise, Bond explained.

Multicloud has flung data all over the place. Effective software must have spider legs that can reach out and quickly gather intelligence about it. Data cataloging may do this with machine learning, human annotation, Google-like search features, etc.

“I think that’s going to be the data platform of the future,” Bond stated.

Informatica Corp. currently leads in this market, according to Bond. Hyperscaler clouds Amazon Web Services Inc. and Google Cloud Platform have recently brought out data-intelligence offerings, he added.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the MIT CDOIQ Symposium.

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU