UPDATED 15:23 EDT / MARCH 23 2012

Foursquare’s Amazon Move Highlights Hadoop Hurdles

In a pair of customer case studies released today, Amazon Web Services hammered on the usual sales pitch for the cloud: Lower OpEx, higher ROI, better scalability and so forth. But what really stood out to me was the Foursquare Labs study, which went into some detail on how the cloud is helping the 10 million member location-based social network leverage big data analytics – and highlighting some problems with Apache Hadoop as it stands today.

Foursquare’s usage of AWS works something like this: The social network uses the open source Apache Flume to funnel “hundreds of millions of application logs” each and every day into the Amazon S3 storage cloud (I suspect a significant fraction of those are generated by a Foursquare-addicted friend of mine who checks in to every single place he goes, but I digress). By analyzing those logs with Amazon Elastic MapReduce, Foursquare gets insight into usage of new features, machine learning, exploratory analysis, customer usage and long-term trends.

Amazon Elastic MapReduce is AWS’s solution for deploying and managing a Hadoop cluster in the cloud. Obviously, MapReduce was a natural choice for Foursquare given its usage of AWS cloud storage, but the option to maintain its own Hadoop cluster was always on the table.

“Using Amazon Elastic MapReduce to analyze data stored in Amazon S3 rather than maintaining our own Hadoop cluster was the clear choice. Hadoop clusters can be difficult to manage, leading to weeks spent debugging minor issues. Amazon Elastic MapReduce gets rid of this wasted time without requiring dedicated support personnel. Additionally, if you want to update your application or need a modified configuration, you can simply terminate the cluster and start a new one,” said Foursquare engineer Matthew Rathbone in that case study.

If you read the study, it goes into a lot more depth on how, exactly, Foursquare turns its usage of MapReduce into cost savings. But essentially, Foursquare runs MapReduce on Heavy Utilization Reserved Amazon EC2 Instances, signing a one-year contract which ran the company over $1 million, but still saved them 35 percent from the pay-as-you-go price. Combined with AWS’s recent price drops for Elastic MapReduce, it’s saving Foursquare over 53 percent on its analytics costs from self-hosting, without any loss in scalability or functionality as big data becomes increasingly crucial to its business model.

“Amazon Elastic MapReduce had already significantly reduced the time, effort, and cost of using Hadoop to generate customer insights […] We have decreased the processing time for urgent data-analysis, all without requiring additional application development or adding risk to our analytics,” Rathbone said.

Services Angle

In a roundabout way, this case study completely validates a move made this week by Think Big Analytics: If and when potential customers hear about Foursquare’s success using MapReduce for analytics, they’re going to look to follow suit. But not every CIO has the team or the expertise to handle their own Amazon MapReduce deployment. Foursquare, as a web-based service to begin with, started with an edge in this instance. Enter service providers like Think Big Analytics, which is extending its consultative expertise in both big data and Amazon Web Services to the enterprise.

On the other hand, not every business is ready to move its big data crunching to the public cloud. But Hadoop still has those problems that Rathbone mentioned around configuration and management. That means that for the business that needs a customized or otherwise fine-tuned Hadoop deployment for analytics, there’s room for a service provider to step in and do their own consulting for deployment. In other words, Foursquare is an exception, but its use-case proves that there’s definitely going to be plenty of demand for service providers as they shift into new roles as consultants.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU