UPDATED 11:30 EDT / NOVEMBER 08 2011

“MySQL Isn’t Going Anywhere Soon, but HBase is a Great Addition to the Tool Belt”

Facebook’s been pretty active lately, launching new features around messaging and timelines.  Perhaps you’ve noticed?  Well, none of these enhanced features would be possible without developments in big data, and Facebook is one to always look for new solutions in solving its own problems.  At Hadoop World in NYC today, Facebook engineer Jonathan Gray discusses their use of Hadoop HBase, a growing platform for enabling realtime operations and ananlytics.  It’s the tool behind many of Facebook’s user operations, from Inbox search to advertising analytics.  HBase has proven a worthy alternative to MapReduce and MySQL for Facebook as it powers towards innovating in the social space.

Some of the key differentiators that led Facebook to HBase include sorted and column-oriented data organization, high write throughput (a big plus for HBase), horizontal scalability and auto-failover.  But there’s quite an evolution to consider here, with several platforms besides HBase still serving their purpose for many companies, including Facebook.  Considering MySQL isn’t necessarily broken, why is Facebook trying to fix it?

One key benefit of HBase is that it comes ready with several of the perks you’ll find with MySQL, making it a great supplement for Facebook’s existing processes as it goes through a big data transition.  HBase’s use of HDFS means Facebook gets all the benefits of HDFS as a storage system, for free.  This enables a range of processes to take place behind the scenes at Facebook, with lower latency, lower costs, and in real time.
Facebook is putting HBase to work in two major applications, including the new Facebook Messages system.  As Facebook looks to unify your messages across all platforms (Messages, IM/chat, e-mail and SMS), there’s a growing and potentially redundant index associated with every single user on the site.  “At the time it was Facebook’s largest engineering efforts, with 15+ engineers for one year, using twenty different infrastructure technologies including Hadoop, HBase, Haystack and ZooKeeper,” says Gray.  It was a product that had to scale, and work right out the gate.  Facebook already had over a billion accounts, with users actively using the site.  With 15+ billion messages sent every month, this was a huge undertaking for Facebook, and a great case study for HBase all the same.

Another important use case for HBase is Facebook’s realtime analytics around ad demographics, domain URLs and other areas where enabling a deep peer into user activity is going to be important.  With its Puma initiative, Facebook is able to perform data conversions in a streaming stile instead of the MapReduce batch style, significantly lowering the latency here.  It’s got massive throughput, managing pillions of unique URLs and able to handle one million conter increments per second.  A project that was once all Java has become broadly applicable for Facebook’s needs, taking the same high of Java and streaming it instead.  This aggregates session logs and other user actions (commenting, Liking, buying credits, playing a Zynga game, etc.), and HBase-powered Puma is able to handle the ever-growing user row.

This is also primed for Ad Insights, letting marketers drill down into how many unique users saw an ad over time.  “Hopefully you’ll be able to move that window over time,” Gray notes.  He also hopes to see this dynamically configred by product teams, so Puma can become more specialized for various Facebook applications, and enabling end users to express how they want to roll up all that data.

That’s not to say there aren’t challenges in integrating a new product into an existing system, especially when it’s replacing parts of a system that’s been around for more than two decades.  HBase itself is still new, barely in its pre-1.0 phase.  For some of Facebook’s other areas of growth, like Spotify assimilation, this could be rather intensive on the write side, and isn’t a necessarily solvable problem for HBase just yet.  Bottlenecks are still something Facebook has to deal with when it comes to generated data, and with its network’s insane requirements around data (zero data loss, low latency, read-modify-write demands), there’s only so much HBase can manage in certain cases, and MySQL has a ready solution.

Yet Facebook sees a promising future with HBase, facing the challenge of big data and software engineering head on.  “MySQL isn’t going anywhere soon,” Gray says, “but HBase is a great addition to the tool belt.”


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU