WANdisco’s new Fusion tool syncs Hadoop clusters
WANdisco Plc. has just announced the release of its new WANdisco Fusion tool, designed to distribute large datasets across multiple Hadoop clusters while keeping them in sync and up to date.
WANdisco Fusion uses active replication technologies to deliver up to date data from one Hadoop cluster to another, regardless of where those clusters are physically located. According to WANdisco’s Randy DeFauw, director of product marketing, the new technology should enable enterprises to roll out Hadoop production servers globally.
“The fundamental ability to use the same data from everywhere, as if everyone was running in the same cluster in the same place, this solves a lot of the key challenges the enterprise Hadoop architects were worrying about,” De Fauw told SDTimes.
If this sounds a little similar to WANdisco’s earlier “NonStop Hadoop” product, well, that’s the intention. WANdisco’s NonStop Hadoop was built to provide extremely fast and reliable data replication for enterprise customers like banks, which require high availability and also the best disaster recovery capabilities. The software is extremely powerful, but also fairly invasive and it had some limitations, De Fauw admitted in an interview with Datanami. For one thing, NonStop Hadoop was installed on the NameNode, which meant it was quite tricky to get up and running.
“Any tweaks made to the underlying Hadoop cluster or NameNode configuration could throw replication, which necessitated a deep level of certification work between WANdisco and the Hadoop distributors,” notes Datanami. “Because of this work, WANdisco focused its certification work with the major open source players who used HDFS: Cloudera and Hortonworks.”
Rather than install its software on the NameNode, WANdisco fusion is installed on a server adjacent to the Hadoop cluster its working on, thus making it far less invasive. This effectively makes WANdisco Fusion an evolution of NonStop Hadoop. “It’s still active-active replication, but we’re sitting at a much higher level in the Hadoop stack,” DeFauw told Datanami. “Instead of working deeply at the NameNode level, it actually works as a proxy application to the Hadoop file system.”
WANdisco Fusion provides other benefits too, as it can be used to boost processing power in the cloud by transferring data to AWS in order to gain additional processing power when it’s required. In addition, WANdisco Fusion can also sync different Hadoop distributions.
“The new architecture also means it has the ability to replicate between different types of Hadoop distributions,” DeFauw told SDTimes. “You can not only replicate between two Hortonworks clusters, you can replicate between Hortonworks and Cloudera and EMC’s Isilon storage systems.”
Finally, WANdisco Fusion can also sync HBase servers, though this requires more technical knowledge than simple HDFS syncing, DeFauw noted.
“[With HBase] The coordination happens for the writes, and each region server maintains its own write log,” De Fauw told SDTimes. “When it comes time to flush the memstore onto disk and write an HFile, every region server can have its own HFile. It writes to its local sever, but which region server should write to HDFS? We have a coordinated flush, where we choose a specific server that will write the file on the underlying file system.”
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU