UPDATED 09:00 EDT / SEPTEMBER 25 2017

BIG DATA

With DataPlane, Hortonworks aims to help companies drowning in data lakes

When the term “data lake” was coined in 2011, the notion was that organizations needed a single pool for all their data, so it could be tapped for whatever analysis or other application it was needed for, instead of languishing in countless data siloes. But now, it’s common for enterprises to have multiple data lakes across their own data centers and multiple cloud computing providers, partly defeating the purpose of the whole idea.

That’s the problem big-data company Hortonworks Inc. aims to help solve with a new service it’s announcing today. The Hortonworks DataPlane Service, or DPS, is a cloud offering that aims to corral data in multiple data lakes and other data repositories, whether it came from customer transactions in the U.S., a security cam in Japan or anywhere else.

The company said the service, built upon open-source technologies such as Apache Atlas for managing the security and usability of data, will make it simpler for companies to set up and operate distributed data systems, whether they’re used for data science, analytics or data warehousing — both so-called data in motion and data at rest. That’s becoming especially critical as data from far-flung Internet of Things devices explodes and companies need to do data-intensive work on it such as artificial intelligence and machine learning for applications ranging from voice and image recognition to self-driving cars.

“It’s clear that data is not going to be in one place,” Hortonworks co-founder and Chief Product Officer Arun Murthy (pictured) said in an interview. “This is the first time that anyone has acknowledged that there will be multiple data lakes.”

The DataPlane service, he said, is a way to stitch together all the data in various data lakes and beyond. With DataPlane, companies get consistent data governance across data lakes. That’s important, Sesh Rangarajan, senior director and head of analytics at Liberty Mutual Benefits, said in a statement: “We require a single data fabric to make sense of it all.”

Indeed, as Noel Yuhanna, an analyst at Forrester Research Inc., described it to SiliconANGLE, it’s turning his notion of a “data fabric” that weaves together many data threads into a reality. The DataPlane won’t actually hold all that data itself, instead serving as a sort of catalog or “semantic” layer, but that’s still a significant technical challenge because of all the sources and types of data.

“This prevents inconsistent data and helps integrate data better,” Yuhanna said. Although there are some hand-coded solutions to the issue, he said, “no one else has gone end-to-end with this kind of thing.” He added, however, that Hortonworks competitors such as MapR Technologies Inc., Cloudera Inc. and Amazon Web Services Inc. will likely head in a similar direction before long.

Image: Hortonworks

Image: Hortonworks

The DataPlane service, which has been in private beta test with a number of customers in financial services, healthcare and other industries, is being offered as open source, and Hortonworks will charge for extensions or applications created on top of it. One that it has created at the outset is a Data Lifecycle Manager that manages the way data is handled, maintaining the context so it complies with central security and governance policies.

Murthy said Hortonworks plans to offer other extensions, but it’s also hoping other companies will offer extensions in a sort of App Store arrangement. Hortonworks is talking to a startup to make profiling and scanning of data into a data lake available on the service, for example.

Not least, IBM Corp. is working with Hortonworks to make its Data Science Experience, Big SQL, Big Integrate and Big Quality data services available directly on DataPlane. “IBM believes that we are evolving to a multicloud world, and Hortonworks DPS is key for integrating disparate datasets in a multicloud environment,” Rob Thomas, general manager of IBM Analytics, said in a statement.

Yuhanna said DataPlane positions Hortonworks to become a “different kind of company” that could help it become a broader company than simply a platform for the Hadoop big-data framework and related open-source technologies. The service could also offer other critical services such as data cleansing and data quality improvement, he said.

“This is a big moment for us,” Murthy said, two years in the making. “This is a fabric across our major platforms.”

Murthy hinted at the coming product when he spoke with SiliconANGLE Media’s video unit theCUBE at Hortonworks’ DataWorks Summit in March about the need to help enterprises from drowning in data lakes:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU