UPDATED 11:54 EDT / MARCH 08 2012

NEWS

HA Name Node Project for Hadoop is No Laughing Matter

Hadoop’s critics are quick to point out the open source framework’s lack of enterprise-readiness. Top of the list of complaints is Hadoop’s single-point-of-failure issue.

In a nutshell, in any given Hadoop cluster, a single Name Node is responsible for tracking which slave nodes are available, where in the cluster certain data resides, and which nodes have failed. Problem is, if the Name Node fails, the whole cluster goes down and it can take hours to get back up and running again.

That’s not a huge issue in small PoC environments doing “experimental” analytics, but it’s a major problem if you’re relying on Hadoop to support user-facing applications. Enterprises can’t afford applications being unavailable for even a few minutes let alone a few hours.

Now, most Hadoop clusters also include a secondary Name Node, but despite the name, it isn’t really a hot back-up as it only periodically replicates metadata from the primary Name Node. So, in essence, the primary Name Node is Hadoop’s Achilles heel, one of the top obstacles preventing widespread enterprise adoption.

The Hadoop community has been working on this issue for years, but it looks like some real progress has finally been made. The HA Name Node project was launched in August 2011, with the goal of delivering two fully functioning Name Nodes – an active Name Node and a passive Name Node – to provide hot backup capabilities for Hadoop.

HA Name Node Architecture (Image courtesy of Aaron Myers, Cloudera)

Aaron Myers, an engineer at Cloudera and one of the primary committers to the HA Name Node project, explained the approach in a recent blog post:

The goal of the HA Name Node project is to add support for deploying two Name Nodes in an active/passive configuration. This is a common configuration for highly-available distributed systems, and HDFS’s architecture lends itself well to this design. Even in a non-HA configuration, HDFS already requires both a Name Node and another node with similar hardware specs which performs checkpointing operations for the Name Node. The design of the HA Name Node is such that the passive Name Node is capable of performing this checkpointing role, thus requiring no additional Hadoop server machines beyond what HDFS already requires … The goal of the HA Name Node is to provide a hot standby Name Node that can take over serving the role of the active Name Node with no downtime. To provide this capability, it is critical that the standby Name Node has the most complete and up-to-date file system state possible in memory.

Myers writes that significant progess has been made, and goes into detail in the post as to the technical details of making HA Name Node a practical reality. The important point is that the HA Name Node is now available as part of Cloudera’s CDH4 beta distribution, released last month. Writes Myers, “I’ve personally run hundreds of MR jobs over a running HA cluster, doing failovers back and forth between two HA Name Nodes, without any job failures.”

But the hard work is not finished. As Myers points out, HA Name Node, as currently constituted, has a number of limitations. Namely, it does not support automatic failover, meaning an administrator must manually initiate the failover to the passive Name Node should the active Name Node fail.

HA Name Node also relies on a HA filer for HDFS edit logs and deploying HA requires an administrator to “manually synchronize the on-disk metadata of the two Name Nodes,” writes Myers.

 ServicesAngle

The progress Myers and his fellow committers have made on the HA Name Node project is impressive and an important step towards making Hadoop  safe for the enterprise. From a services perspective, manual intervention is still required to take advantage of the hot fail-over capabilities, but that’s a significant improvement over the time and effort required to currently get a Hadoop cluster back up and humming in the event of a Name Node failure.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU