UPDATED 23:10 EDT / AUGUST 18 2011

NEWS

Amazon Web Services Offers Spot Market Capabilities for Managing Hadoop Clusters

Amazon Web Services is combining two Amazon EC2 features: Spot Instances and Elastic MapReduce that will allow customers to launch and manage Hadoop clusters using unused EC2 capacity.

According to Jeff Barr on the Amazon Web Services blog, customers will be able to run long-running jobs, cost-driven workloads, data-critical workloads, and application testing at a discount that has historically ranged between 50% and 66%.

It also brings an element of the financial markets to managing MapReduce.

For example, let’s say you want to run a load on Amazon EC2 that is not critical to the daily work cycle. You can put a spot bid on the work load. If the cost exceeds the bid then the job is automatically terminated.  For more regular work loads, a time can be chosen that is least expensive. It can then run with the confidence it will be completed.

According to Barr, the EC2 instances used to run an Elastic MapReduce job flow fall in to one of three categories or instance groups:

Master– The Master instance group contains a single EC2 instance. This instance schedules Hadoop tasks on the CORE and TASK nodes.

Core – The Core instance group contains one or more EC2 instances. These instances use HDFS to store the data for the job flow. They also run mapper and reducer tasks as specified in the job flow. This group can be expanded in order to accelerate a running job flow.

Task – The Task instance group contains zero or more EC2 instances and runs mapper and reduce tasks. Since they don’t store any data, this group can expand or contract during the course of a job flow.

Here’s a video, explaining how the combined service works:

Use cases serve as the context for the way the combined services work together. According to Barr there are two in particular to think about:

  1. Batch-processing workloads that are not particularly time-sensitive such as image and video processing, data processing for scientific research, financial modeling, and financial analysis.
  2. Data warehouses that have a recurring workload variance at peak times.

Foursquare has started using the service, performing analytics across more than three million daily check ins and has reduced analytics costs by more than 50%. They have decreased processing time for urgent data-analysis, all without requiring additional application development or adding risk to the company’s analytics.

Services Angle

Amazon is showing once again how far ahead it is in the infrastructure market. They are also showing how cloud services often resemble financial markets in how they can operate. The AWS example for video processing is a case in point. Just think of the savings if a movie studio could bid on a time to process its 3D animations? The savings could be considerable.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU