Cloudera debuts all-open-source integrated cloud data platform
Two months after adopting an all-open-source strategy, Cloudera Inc. today is announcing an integrated data platform made up entirely of open-source elements.
Cloudera Data Platform is being positioned as one-stop-shopping cloud service for organizations that want to perform analytics across hybrid and multicloud environments with enterprise-grade security and governance.
The package combines a cloud-native data warehouse, machine learning service and data hub, each running as instances within the self-contained operating environments called containers. Queries are managed by Apache Hive or Apache Impala, the latter of which was developed by Cloudera.
“The knock on Hadoop has always its operational complexity and the fact that it’s difficult to use,” said Arun Murthy (pictured), Cloudera’s co-founder and chief product officer. “What we’ve invented is an experience that attacks both.”
The focus of the Cloudera Data Platform is on reducing the time needed to install and configure multiple elements needed to create a data warehouse, analytics workbench or machine learning training suite. By using existing components in the cloud, the platform cuts deployment times from weeks to hours, Murthy said. The software works natively on Amazon Web Services Inc. S3 data natively and supports the Hadoop Distributed File System.
“To date we’ve been offering a bunch of HDFS clusters and customers had to install their own extensions,” he said. “With Cloudera Data Platform these are all native services. You can set up a secure data lake in a couple of hours.”
The platform also leverages Cloudera’s Shared Data Experience, a unified data framework that includes schema, permissions and governance artifacts. It enables multiple users to work from the same data and catalog using the tools that they prefer and to migrate workloads to the cloud.
“We move not just the bits but the data, the metadata, the tables and the security protocols,” Murthy said. “It’s secure end-to-end and it’s fully open.”
The combination of real-time processing and predictive analytics enables applications like real-time predictive billing, which can alert customers of excessive charges accruing to their mobile phone accounts, for example, as a result of leaving data services on while roaming, Murthy said.
Customers using Cloudera’s on-premises software can get a single view of both their local and cloud workloads. Cloudera Data Platform is currently a cloud-only service for workloads running on Amazon infrastructure.
An on-premises option, called CDP Data Center, will be available later this year with annual subscriptions starting at $10,000 per node. A preview version for Microsoft Corp.’s Azure cloud is due in a few months with support for Google LLC’s cloud likely to come early next year. Pricing information is published here.
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.