UPDATED 09:43 EDT / OCTOBER 30 2013

WANdisco fights data failures like no one else | #BigDataNYC

by Kathryn Buford

In their ongoing coverage of SiliconANGLE’s own Big Data NYC conference, Dave Vellante and Jeff Kelly speak with Brett Rudenstein, Senior Product Manager of Big Data for WANdisco. The three discuss WANdisco’s unique patented technology that allows Hadoop to run across a wide area network despite severe network failures.

Rudenstein performs a demonstration of an active-active environment involving two data centers that are about 3,000 miles away. He shows how one can drill down on a wan-link that gives an exposed view of the data center. Rudenstein explains, “….across a wide area network we stand up a single HDFS.” The demonstration involves two data centers and running MapReduce to illustrate data replication across the two.

In his typical direct fashion, Kelly asks, “What does this allow an enterprise to do that it couldn’t do before?” Rudenstein explains that the new advantage is for disaster recovery, failover is virtually zero because the other site is already active. In his demonstration, both sites are active, allowing users to ingest and run data into both sites. He contrasts this with the problems typically encountered with distCp and parallel data ingest methods, stating that typically, a problem arises when a file is not copied if it has the same size and name as its predecessor. Rudenstein further explains, “After a couple of months you see clusters diverge. And, then it’s a constant manual effort to figure out the differences between the data centers.”

Vellante enquires, “How does a sys-admin get behind this?” Rudenstein says their task would be to check and run sums against each cluster individually to make sure the files line up; when they don’t, they have to ask the users about which files are correct.

Continuing with the demonstration, Rudenstein reviews what happens when a local namenode fails by rebooting it. Despite the failure, the job continues uninterrupted. Now, as the namenode comes back online the program begins its “self-healing” process. Rudenstein says, “It will learn from the other namenodes that it’s behind in the global sequence…it will get caught up and then become an active participant in the cluster.” He throws one more failure at it, causing a complete wan separation. At the end of the demonstration, Rudenstein shows how the technology resumes replicating the data despite the failures.

Vellante explains that the demonstration was not trivial. He asks if there is any other technology that can provide resiliency like this. Rudenstein responds that most technologies are typically limited because latency is a big challenge. WANdisco’s technology is time independent, wan independent, and does not utilize hardware; it is also a share nothing architecture.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.