UPDATED 00:01 EST / JUNE 28 2019

AI

IBM and Trifacta collaborate on new data prep tool for AI models

IBM Corp. is trying to address the cumbersome and time-consuming process of preparing data for use in artificial intelligence and machine learning model training with a new data preparation tool it developed in tandem with Trifacta Inc.

The companies point out that data preparation is an essential step in building machine learning and predictive models. That’s because the data needs to be extremely accurate or else the models will be ineffective, but the problem is that data scientists can spend up to 80% of their time on this task.

That’s an awful lot of time which could be better used doing other things, which is why IBM and Trifacta today are announcing their new InfoSphere Advanced Data Preparation tool, which they say helps speed up the process.

With InfoSphere, data scientists can transform their raw datasets into a format that’s suitable for machine learning models to feed on, while working with their existing data lakes and data warehouses.

The tool has been designed for “formatting, structuring and enriching the datasets for analytic processing and standard reporting,” the companies said. It works by helping users to visualize the data preparation process so they can continually track the quality of their data and ensure no errors occur while it’s being formatted. The process is fully automated too, which means regular employees as well as data scientists can prepare and enrich their data for analytics purposes.

Trifacta Chief Executive Officer Adam Wilson said the company worked with IBM to create InfoSphere after witnessing numerous organizations struggle with their AI initiatives due to poor data quality and inefficient preparation processes.

“This collaboration will empower organizations to accelerate data preparation for self-service analytics in a governed and centrally managed environment,” Wilson said.

Constellation Research Inc. analyst Doug Henschen told SiliconANGLE that the collaboration with IBM was actually something of a coup for Trifacta, not to mention a time saver for IBM, because it allows it to bring “state-of-the-art self-service data prep capabilities to market” more quickly than it could have done by itself.

“I think IBM is smart to focus on the development, deployment, monitoring and ongoing management aspects of the modeling lifecycle and developing automation where possible,” Henschen said. “Why be a ‘me-too’ on prep challenges that Trifacta has addressed quite well? Trifacta also has a significant partnership with Google for its cloud platform, so this is a second endorsement of its capabilities by a notable partner.”

Along with the new data prep tool, IBM also announced some updates to its Cloud Pak for Data service, which is used to integrate, govern and manage data across various public and private clouds. The updates include the availability of Watson Knowledge Catalog Professional on Cloud Pak for Data, which is said to improve “data findability
for analysis” and provide more governance tools.

There’s also DataStage Edition for IBM Cloud Pak for Data, which is meant to reduce latency in data transformation jobs, and Watson Discovery for IBM Cloud Pak for Data, which is an AI search tool for discovering data across various clouds.

Image: xresch/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU