UPDATED 09:00 EDT / SEPTEMBER 24 2019

BIG DATA

Cloudera debuts all-open-source integrated cloud data platform

Two months after adopting an all-open-source strategy, Cloudera Inc. today is announcing an integrated data platform made up entirely of open-source elements.

Cloudera Data Platform is being positioned as one-stop-shopping cloud service for organizations that want to perform analytics across hybrid and multicloud environments with enterprise-grade security and governance.

The package combines a cloud-native data warehouse, machine learning service and data hub, each running as instances within the self-contained operating environments called containers. Queries are managed by Apache Hive or Apache Impala, the latter of which was developed by Cloudera.

“The knock on Hadoop has always its operational complexity and the fact that it’s difficult to use,” said Arun Murthy (pictured), Cloudera’s co-founder and chief product officer. “What we’ve invented is an experience that attacks both.”

The focus of the Cloudera Data Platform is on reducing the time needed to install and configure multiple elements needed to create a data warehouse, analytics workbench or machine learning training suite. By using existing components in the cloud, the platform cuts deployment times from weeks to hours, Murthy said. The software works natively on Amazon Web Services Inc. S3 data natively and supports the Hadoop Distributed File System.

“To date we’ve been offering a bunch of HDFS clusters and customers had to install their own extensions,” he said. “With Cloudera Data Platform these are all native services. You can set up a secure data lake in a couple of hours.”

The platform also leverages Cloudera’s Shared Data Experience, a unified data framework that includes schema, permissions and governance artifacts. It enables multiple users to work from the same data and catalog using the tools that they prefer and to migrate workloads to the cloud.

“We move not just the bits but the data, the metadata, the tables and the security protocols,” Murthy said. “It’s secure end-to-end and it’s fully open.”

The combination of real-time processing and predictive analytics enables applications like real-time predictive billing, which can alert customers of excessive charges accruing to their mobile phone accounts, for example, as a result of leaving data services on while roaming, Murthy said.

Customers using Cloudera’s on-premises software can get a single view of both their local and cloud workloads. Cloudera Data Platform is currently a cloud-only service for workloads running on Amazon infrastructure.

An on-premises option, called CDP Data Center, will be available later this year with annual subscriptions starting at $10,000 per node. A preview version for Microsoft Corp.’s Azure cloud is due in a few months with support for Google LLC’s cloud likely to come early next year. Pricing information is published here.

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU