UPDATED 12:42 EDT / SEPTEMBER 28 2015

NEWS

Kudu: How Cloudera wants to save Hadoop by killing it

The massive drop in memory prices that is leading Hadoop adopters to abandon the disk-oriented MapReduce has now finally caught up to the storage component of the framework as well with the introduction of an alternative from none other than Cloudera Inc., its prime distributor. The move signals the beginning of the end for the decade-old project in its present form.

Like MapReduce, the Hadoop File System was created in a time when the most viable option for processing large amounts of unstructured records was storing the data on disk and slowly bringing small pieces into memory for analysis. The community has worked to adapt to the framework as the underlying economies of the situation shifted over years, but to limited effect.

And thus the Hadoop File System became a bottleneck for the growing number of organizations that are turning to Spark in hopes of exploiting the large amount of affordable memory suddenly at their disposal to remove the overhead involved in shuffling data back and forth from disk. That is proving detrimental to Hadoop as a whole, with a recent study finding that standalone deployments of Spark are quickly becoming the norm.

The soon-to-launch Kudu is Cloudera’s attempt to reverse that trend. It’s the product of a development effort spanning more than three years that began when its engineers realized that the changes needed to address the shift in infrastructure composition were too great to implement in the Hadoop File System or the complementary HBase database.

The result is a columnar store that combines the best qualities of both to provide what is touted as a unified platform for supporting the machine learning and predictive analytics workloads that organizations are running on Spark. Kudu exploits the abundance of memory in modern analytics clusters to make large parts of the information inside, including the metadata, instantly available for modification,

The cached changes are periodically propagated to disk in a single efficient batch that requires less overhead to write to disk than multiple small operations. Like the Hadoop File System, Kudu distributes the work across all the machines in a cluster and designates a master node to keep everything coordinated.

Users will eventually be able to spread out that later duty among multiple servers similarly to how data ingestion is currently handled for greater reliability, one of the many features that Cloudera has in the pipe for Kudu. On one hand, that’s encouraging for organizations that may be considering to jump aboard the bandwagon, but on the other, it’s also a measure of the technology’s present immaturity.

The roadmap for Kudu is long and difficult, not only in the technical sense and from a more strategic perspective as well. Cloudera’s efforts to merge the capabilities of the Hadoop File System and HBase in a single platform highlights a broader consolidation of the project that is best reflected by Spark, which can substitute many of the disparate components in current distributions with native addons, leaving less opportunities for vendors to add value.

That will make it harder for Cloudera and its peers to remain competitive as the engine and its specialty components in particular continue to gain steam. As a result, the consolidation that is occurring in the upstream ecosystem today may very well end up spilling over to the vendors trying to commercialize it tomorrow. “The core message about the Hadoop ecosystem getting hollowed out by Spark is the single biggest trend going on in big data right now,” commented Wikibon’s George Gilbert.

Photo via Skeeze

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Kudu: How Cloudera wants to save Hadoop by killing it

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Kudu: How Cloudera wants to save Hadoop by killing it

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026