UPDATED 06:31 EDT / JUNE 27 2014

Hortonworks lights up Hadoop: Apache Spark declared YARN-ready

small__4635261557(1)Hortonworks said Apache Spark, a new technology that’s quickly gaining interest for in-memory-accelerated machine learning and other forms of high-volume data analysis,  is now enabled to plug into Apache YARN, the resource-management layer introduced last year with Apache Hadoop 2.0.

Apache Spark is high-speed engine for large-scale data processing that was released as version 1.0.0 last May. It’s designed to run much faster than Hadoop’s MapReduce, and is capable of tacking more specialized applications. Spark is now ready to run as a technology preview on Hortonwork Data Platform (HDP), with a production-certified release set for later this year.

Hortonworks is a little bit late to the Apache Spark game. Back in February, Cloudera added support for Spark using its Cloudera Manager software for deployment, managing and monitoring. MapR followed up with its own Spark deployment last April. Now Hortonworks is getting in on the game, stressing that its version is 100 percent open-source, using YARN to monitor and manage the components.

In an interview with V3.co.uk, Hortonworks vice president of Corporate Strategy Shaun Connolly said developers using the Scala language were particularly interested in Spark. It allows them to perform analysis on Hadoop data for customer segmentation and other advanced techniques like classifying and clustering datasets. Now that Spark is YARN-ready, users can run Spark applications in a Hadoop cluster alongside other workloads, rather than doing so in a different cluster.

“Since Spark has requirements that are much heavier on memory and CPU, YARN-enabling it will ensure that the resources of a Spark user don’t dominate the cluster when SQL or MapReduce users are running their application,” said Connolly.

To ensure everything runs smoothly, Hortonworks is teaming up with Databricks – a company founded by Apache Spark’s creators – to make sure new apps and tools built on Spark are compatible.

“With the designation of Apache Spark as YARN-ready, enterprises can rest assured that Spark can run simultaneously and effectively with other mission-critical applications,” said Databricks business development executive Arsalan Tavakoli-Shiraji in a statement.

HDP 2.1 Tech Preview Component of Apache Spark can now be downloaded and installed on the current HDP 2.0 distro for free. Hortonworks says its HDP 2.1 release, which will include Spark, is expected to be ready in the next few months.

photo credit: Striking Photography by Bo Insogna via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU