UPDATED 09:00 EDT / MARCH 16 2018

BIG DATA

IBM calls its new machine learning platform ‘the reinvention of the database’

IBM Corp. today unveiled a new data science and machine learning platform that one executive called “the most significant announcement we’ve made about data in years.”

Featuring an in-memory database, a real-time processing engine and the ability to ingest and analyze massive amounts of data, the Cloud Private for Data constitutes an integrated data science, data engineering and application development platform. It’s intended for building event-driven applications that can handle “torrents of data from things like internet of thing sensors, online commerce, mobile devices, and more,” the company said in a press release.

With a combination of features that enables organizations to ingest, transform and analyze streaming data on a single software stack, the engine sits atop Cloud Private, which is IBM’s version of the Kubernetes container orchestration platform. Software containers abstract applications away from the underlying hardware, enabling them to run on any computing platform. Cloud Private for Data is intended for “anybody who wants to refine data, build models and deploy them into production in a single experience,” said Rob Thomas (pictured), general manager of analytics at IBM.

“Moving to a single architecture changes the cost of managing multiple product stacks,” Thomas said. “You don’t need a separate data base and data governance team. The integrated features, combined with Kubernetes and a microservices architecture, he said, reduce the cost of managing multiple stacks by about 80 percent. A microservices architecture creates applications from collections of small services that are called and run as needed rather than being part of a more complex integrated application.

A core feature of the platform is an intelligent data catalog enabled by machine learning technology that automates the often arduous process of creating meta-tags. That’s combined with a real-time ingestion engine based on Apache Spark and the Apache Parquet column-oriented data store.

Faster cleanup

IBM also said it has automated much of the data cleansing process that is estimated to consume up to 80 percent of the average data scientist’s time. “Our view is that you should leave your data where it is because we can access and federate it on public cloud and local storage,” Thomas said. The Cloud Private Data platform also includes elements of IBM’s Data Science Experience, Information Analyzer, Information Governance Catalog, Data Stage, Db2 and Db2 Warehouse.

The combination can ingest up to 1 million rows per second and 250 billion events per day for both transaction and analytical processing. IBM has also automated provisioning with a microservices-based approach that removes much of the need for manual configuration. “We say whatever infrastructure you’ve got is what we’ll work with,” he said, adding that the full stack “provisions in minutes.”

IBM is pitching the platform at the growing number of users looking to build machine learning models but constrained by the inability to process the large amounts of data that such models require. “Our ability to put AI to use is limited by a lack of data, and the biggest issue is collection and preparation,” Thomas said. “If you can do those right, then using natural language processing and AI is much easier.” IBM intends to support all major machine learning libraries.

“My view is that this is the reinvention of the database,” Thomas said. “You’d struggle to find anything comparable.”

To help customers down the road toward AI, IBM has gathered more than 30 data scientists, machine learning engineers, decision optimization engineers, data engineers and data journalists into a no-charge consultancy that can help solve data science problems. IBM expects the team, which is already working with more than 50 customer organizations, to grow to 200 people over the next few years.

Image: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU