UPDATED 09:00 EST / JUNE 27 2023

BIG DATA

Cloudera expands Apache Iceberg support to private clouds

Data warehouse software provider Cloudera Inc. is expanding support for Apache Iceberg, making it available in the self-hosted, private cloud version of its flagship Cloudera Data Platform.

Today’s move brings CDP-Private Cloud into line with Cloudera’s CDP-Public Cloud offering, which added support for Apache Iceberg last year. Apache Iceberg for CDP-Private Cloud is available now as a technology preview, and will become generally available later this year.

CDP is Cloudera’s main product offering, encompassing both data management and analytics services. It’s used by companies to collect structured and unstructured business records from internal systems and centralize them in a single, giant “data lakehouse,” where they can be analyzed for insights or used to train artificial intelligence models. In addition, CDP provides various tools for performing the complex data preparation tasks that’s required to transform information into different formats.

With CDP-Public Cloud and CDP-Private Cloud, Cloudera gives customers the ability to host its platform on public cloud platforms such as Amazon Web Services, Google Cloud or Microsoft Azure, or alternatively within their own on-premises data centers. As for Iceberg, it’s a key building block of Cloudera’s data lakehouse. It’s an open table format for data lakes that was originally designed by Netflix Inc. to overcome the challenges it came across when using alternative data formats such as Apache Hive, Impala and Spark.

The biggest problem with those older formats is that they’re tied to primary engines and, oftentimes, single providers. But Cloudera wants to create a truly open data lakehouse that’s free of concerns around vendor lock-in. The open-source and cloud-native Apache Iceberg table format is intended to be the tonic.

It has some impressive capabilities too, with the ability to handle petabyte-scale object storage without any performance degradation, Cloudera said when it first announced support. In addition, Iceberg offers features around in-place table evolution, support for point-in-time queries, concurrent multifunction analytics and improved performance through aggressive partitioning, making it ideal for the very large datasets needed to train AI models.

Ram Venkatesh, Cloudera’s chief technology officer, said the addition of Iceberg makes Cloudera CDP-Private Cloud an ideal option for tasks such as large language model training. “Large enterprises want to get business value from all of their data using AI and data analytics,” he explained. “Our customers can now gain from Iceberg ‘everywhere’ they need it to be.”

IT Market Strategy analyst Merv Adrian said Iceberg is a key addition for Cloudera CDP-Private customers, since it’s a key enabler of multifunction, multivendor data ecosystems. “It’s a big win for enterprises that need to involve all their data to get the most from AI,” he added.

Photo: Janvanbizar/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU