UPDATED 16:48 EST / APRIL 16 2018

BIG DATA

AI powers the catalogs of next-generation big data

Data’s relevance doesn’t always jump out at you. It takes work to distill useful insights from enterprise data lakes that are increasingly too large, diverse and dynamic to be explored through entirely manual methods.

Discoverability and visibility are what unlocks data’s value. More enterprises are embracing big-data catalogs to harness insights that would otherwise stay dormant and overlooked. Recognizing this growing demand, more data management solution providers are building sophisticated catalogs into their solution portfolios, as discussed in Wikibon’s recent big-data market study.

Artificial intelligence is a key force driving the evolution of big-data catalogs into enterprisewide platforms for collaboration curation. Increasingly, providers are integrating AI into their offerings to help users discover, refine, explore, analyze and apply complex data sets more rapidly and intelligently to diverse applications.

Among data management vendors, Informatica LLC has set the pace in the weaving of AI-infused metadata-management capabilities into its solution portfolio. In the breadth and sophistication of its AI capabilities, Informatica stands apart from other data catalog solution providers such as Alation Inc., Cloudera Inc., Hortonworks Inc. and Microsoft Corp.

The company briefed Wikibon last summer on its roadmap to integrate AI as an enabling capability across its entire product line, with its Enterprise Data Catalog at the center. At that time, Informatica had already incorporated AI — which it brands as “CLAIRE” — into its catalog to automate data clustering, tagging, and domain/entity recognition. The AI-powered catalog intelligently scans data assets from across the enterprise and automatically adds business context metadata. In its data integration offerings, Informatica had already integrated such CLAIRE AI technologies as genetic algorithms (to identify complex data sub-structures), natural language processing algorithms (to drive semantics-based modifications to data models) and machine learning algorithms (to parse clickstream, log, system, JSON and other “internet of things” data).

At Informatica World 2017, CEO Anil Chakravarthy spoke to theCUBE about how CLAIRE figures into its product roadmap going forward. “When we built CLAIRE, “ he said, “we did not invent the artificial intelligence or the machine learning. A lot of that is already available. So we took a lot of the best algorithms in machine learning and applied them to metadata and data management. That’s the secret sauce. It’s not the building the AI itself, it’s the use of the AI for data management.”

Chakravarthy emphasized that CLAIRE is “not a product. It’s … a cloud-scale, AI-powered real time engine that powers other products.” He added that CLAIRE will be embedded in Informatica products so that customers won’t have to deploy it explicitly. “So it means once you have any product like our enterprise data catalog or data governance solutions, you’re starting to use CLAIRE and then you can use CLAIRE for other use cases as well.”

In a new product announcement today, Informatica rolled out new features that infuse CLAIRE’s AI smarts more deeply into the catalog at the heart of its solution portfolio. The company’s core announcements were twofold: It has introduced enhanced AI algorithms for improved curation and classification of structured and unstructured data, and it now provides an integrated metadata-driven intelligent API.

These new features support self-service discovery of the catalogued data that is best for the task at hand, such as training a machine learning model or curating customer datasets. They also enable users, such as data scientists and stewards, to apply the catalogued data via a single click to whatever application environment they’re working within. In addition, Informatica now provides single-click deployment of the catalog to the Amazon Web Services and Microsoft Azure, so all of these features are available within those public clouds.

Over the next several years, Wikibon expects to see big-data catalogs become ubiquitous in enterprise data environments, with AI, intelligent metadata, recommendation engines and automated task-specific guidance as essential features. These capabilities will help organizations to manage their growing information assets across more complex hybrid clouds.

Image: stux/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU