UPDATED 08:52 EDT / JUNE 18 2015

NEWS

AWS adds support for Apache Spark on EMR

The Apache Spark open-source distributed processing engine for Big Data workloads is coming to Amazon Web Services (AWS). The cloud giant has just updated its EMR (Elastic MapReduce) service to handle Spark applications, meaning enterprises can now use the popular processing engine without needing to build their own infrastructure first.

Spark joins other applications in the Hadoop ecosystem like Hive, Pig, HBase, Presto, Impala, and others in getting official support from AWS. Amazon says Spark is a particularly good fit for batch processing, graph databases, streaming and machine learning thanks to its in-memory caching, optimized execution and fast performance. EMR now supports version 1.3.1 of Spark, utilizing Hadoop YARN as the cluster manager.

Of course, some people have already been running Spark on AWS’ EMR for some time, but doing so was always a far more difficult proposition without Amazon’s integrated support. Now, it’s far more straightforward – IT staff can spin up a cluster from the AWS Management Console in seconds, Amazon says. EMR is capable of running Spark applications using Java, Python, Scala and SQL, the cloud giant added.

It’s been a busy week all round for Spark with the Spark Summit in San Francisco taking place this week. Not only was there a new release from Databricks, but IBM also made a major commitment by devoting 3,500 engineers to the project in addition to launching its own Spark service. Elsewhere, MapR Technologies Inc. announced the launch of specialized analytic workflows for Spark with its own Hadoop distribution, while Mesosphere Inc. said it’s to partner with Typesafe Inc. to provide support for an instance of Apache Spark that can be run atop of the Mesosphere Data Center Operating System (DCOS) on the Amazon Web Services cloud.

As far as pricing goes, Amazon says this will be based on the cost of the underlying EC2 instances, with a separate charge added for using the service. Running Spark on EMR and a basic c3.xlarge instance will cost $0.263 per hour on-demand, while the more powerful c3.8xlarge instance is priced at $1.95 per hour. Amazon also offers even more expensive instances with greater memory and storage capabilities – the price for running Spark on these has to be multiplied by the number of nodes running to arrive at a figure.

Image credit: Skeeze via pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.