UPDATED 08:52 EDT / JUNE 18 2015

NEWS

AWS adds support for Apache Spark on EMR

The Apache Spark open-source distributed processing engine for Big Data workloads is coming to Amazon Web Services (AWS). The cloud giant has just updated its EMR (Elastic MapReduce) service to handle Spark applications, meaning enterprises can now use the popular processing engine without needing to build their own infrastructure first.

Spark joins other applications in the Hadoop ecosystem like Hive, Pig, HBase, Presto, Impala, and others in getting official support from AWS. Amazon says Spark is a particularly good fit for batch processing, graph databases, streaming and machine learning thanks to its in-memory caching, optimized execution and fast performance. EMR now supports version 1.3.1 of Spark, utilizing Hadoop YARN as the cluster manager.

Of course, some people have already been running Spark on AWS’ EMR for some time, but doing so was always a far more difficult proposition without Amazon’s integrated support. Now, it’s far more straightforward – IT staff can spin up a cluster from the AWS Management Console in seconds, Amazon says. EMR is capable of running Spark applications using Java, Python, Scala and SQL, the cloud giant added.

It’s been a busy week all round for Spark with the Spark Summit in San Francisco taking place this week. Not only was there a new release from Databricks, but IBM also made a major commitment by devoting 3,500 engineers to the project in addition to launching its own Spark service. Elsewhere, MapR Technologies Inc. announced the launch of specialized analytic workflows for Spark with its own Hadoop distribution, while Mesosphere Inc. said it’s to partner with Typesafe Inc. to provide support for an instance of Apache Spark that can be run atop of the Mesosphere Data Center Operating System (DCOS) on the Amazon Web Services cloud.

As far as pricing goes, Amazon says this will be based on the cost of the underlying EC2 instances, with a separate charge added for using the service. Running Spark on EMR and a basic c3.xlarge instance will cost $0.263 per hour on-demand, while the more powerful c3.8xlarge instance is priced at $1.95 per hour. Amazon also offers even more expensive instances with greater memory and storage capabilities – the price for running Spark on these has to be multiplied by the number of nodes running to arrive at a figure.

Image credit: Skeeze via pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU