UPDATED 15:31 EDT / JUNE 09 2014

Rethinking infrastructure : Facebook adds availability with HBase fork

open sourceHaving reached a tipping point in the volume of unstructured information years before the new reality of data management dawned upon the rest of the industry, the world’s largest Internet companies had no choice but to come up with their own answers to the challenges at hand.  Their pioneering efforts laid the groundwork for the Hadoop ecosystem, which constitutes the foundation of modern analytics.

Facebook is carrying on the tradition with HydraBase, a newly unveiled internally-developed version of the free Apache HBase database hardened to address the availability expectations of more than one billion users. The project builds on the company’s existing experience with open-source data processing technologies.

The social networking giant is credited as the creator of not one but two widely used projects in Hadoop ecosystem: the Hive data warehouse and Cassandra, a distributed platform for storing large amounts of information across commodity servers originally developed to power its Inbox Search feature.

Facebook was also an early adopter of HBase, which shares characteristics with both solutions. Known for its consistency, the columnar database runs directly atop the Hadoop File System, like Hive, while enabling seamless horizontal scalability that makes it particularly well suited for storing large datasets containing billions and billions of rows.

Behind the scenes, HBase splits database tables into logical chunks managed by so-called “region servers” that are responsible for making  information available to client processes. Each region can only be served by one instance at any given time, which avoids data conflicts within rows but has the downside of creating a single point of failure.

In practice, that means that when a region server fails, the information stored on that machine has to be migrated to another in a process that can take several minutes to complete and incurs the risk of data loss.  For Facebook, which relies on HBase to power a myriad of internal and external services, that has proven increasingly unacceptable. HydraBase is the company’s response.

Rethinking server management

The project addresses both issues at one fell swoop by assigning multiple servers for each region so that if the main one fails, the others can quickly take over.  Instances can be spread across various parts of a data center or different facilitates for even greater fault tolerance.

Facebook claims that the system can add another 9 to the reliability of HBase for a total of 5, or 99.999 percent, which would translate into no more than five minutes of downtime per year. The social networking giant hopes to further reduce that in the future with planned features like the ability to make data accessible via every member of a region server group and the option keep in-memory copies in each to speed failover.

HydraBase is set to begin rolling out to Facebook’s production clusters as soon as it exits the initial development stage, although no timeframe was provided. Judging by its past history, there’s good chance that the company will decide to open-source the project sometime later down the line, but that has not yet been confirmed either.

photo credit: opensourceway via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU