UPDATED 09:43 EST / OCTOBER 30 2013

WANdisco fights data failures like no one else | #BigDataNYC

In their ongoing coverage of SiliconANGLE’s own Big Data NYC conference, Dave Vellante and Jeff Kelly speak with Brett Rudenstein, Senior Product Manager of Big Data for WANdisco. The three discuss WANdisco’s unique patented technology that allows Hadoop to run across a wide area network despite severe network failures.

Rudenstein performs a demonstration of an active-active environment involving two data centers that are about 3,000 miles away. He shows how one can drill down on a wan-link that gives an exposed view of the data center. Rudenstein explains, “….across a wide area network we stand up a single HDFS.” The demonstration involves two data centers and running MapReduce to illustrate data replication across the two.

In his typical direct fashion, Kelly asks, “What does this allow an enterprise to do that it couldn’t do before?” Rudenstein explains that the new advantage is for disaster recovery, failover is virtually zero because the other site is already active.   In his demonstration, both sites are active, allowing users to ingest and run data into both sites. He contrasts this with the problems typically encountered with distCp and parallel data ingest methods, stating that typically, a problem arises when a file is not copied if it has the same size and name as its predecessor. Rudenstein further explains, “After a couple of months you see clusters diverge. And, then it’s a constant manual effort to figure out the differences between the data centers.”

Vellante enquires, “How does a sys-admin get behind this?” Rudenstein says their task would be to check and run sums against each cluster individually to make sure the files line up; when they don’t, they have to ask the users about which files are correct.

Continuing with the demonstration, Rudenstein reviews what happens when a local namenode fails by rebooting it. Despite the failure, the job continues uninterrupted. Now, as the namenode comes back online the program begins its “self-healing” process. Rudenstein says, “It will learn from the other namenodes that it’s behind in the global sequence…it will get caught up and then become an active participant in the cluster.” He throws one more failure at it, causing a complete wan separation. At the end of the demonstration, Rudenstein shows how the technology resumes replicating the data despite the failures.

Vellante explains that the demonstration was not trivial. He asks if there is any other technology that can provide resiliency like this. Rudenstein responds that most technologies are typically limited because latency is a big challenge. WANdisco’s technology is time independent, wan independent, and does not utilize hardware; it is also a share nothing architecture.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU