UPDATED 12:01 EDT / SEPTEMBER 09 2015

NEWS

Cloudera backs Spark as successor to MapReduce in Hadoop

Hadoop is entering a new chapter in its evolution with the launch of an ambitious community effort from Cloudera Inc. that aims to replace MapReduce as its default data processing engine. The proposed successor is predictably Apache Spark, the speedy in-memory alternative that has been gaining steam among adopters in the last few years.

The Hadoop distributor claims that industry interest is at a point where the engine is now the single most widely-used component in the entire upstream ecosystem, with 200 of its own customers having joined the bandwagon over the past 18 months alone. The new One Platform Initiative represents its response to that shift.

The push will concentrate on bringing the level of integration between Spark and the other projects in the Hadoop universe more up to par with the interoperability of MapReduce, which has a considerable head start thanks to the fact that the framework was built around it from the outset. Cloudera is already well into its effort, having made over 370 patches to the in-memory engine so far.

That adds up to about 43,000 lines of codes, a sizable portion of which is designed to help Spark work better with essential components such as the YARN resource manager that makes it possible to run multiple different analytics workloads on the same Hadoop clusters. The One Platform Initiative will expand upon that integration with support for several other complementary technologies, particularly on the security front.

One of the first items on the agenda is support for Intel Corp.’s Advanced Encryption libraries, which Cloudera plans to follow up with more granular access controls. The ultimate goal is to help Spark live up to the security standards of even the most heavily regulated sectors, especially the banking and medical industries, which are the forefront of Spark adoption.

At the same time, the company will also work to enhance the core data crunching capabilities of the engine through the development of new management features to help organizations scale their deployments more effectively and improvements to its emerging stream processing component. Both are essential to the continued growth of Spark.

But what gives special urgency to the One Platform Initiative is the fact that the engine can work without Hadoop. That means that if Cloudera doesn’t make it not only easy but also appealing to deploy Spark on the framework, it potentially risks losing the growing number of its customers moving away from MapReduce on the long-term. That risk is all the greater in view of BM Corp.’s recent commitment to invest a billion dollars into accelerating the development of the engine.

Photo via AdjencaJA

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU