This week Yahoo and Hortonworks, a spin-off from the internet portal operator, hosted the fifth annual Hadoop Summit. The press floodgates were opened even before the vendors had a chance to settle in their booths.
The day before Hadoop Summit kicked off, Hortonworks announced its highly anticipated first product: The Hortonworks Data Platform. It does a few things very differently than the distribution offered by the likes of Cloudera and MapR.
The Data Platform was built for the channel rather than end user, and it’s based on the first, more trusted release of Hadoop. The distro leverages several other lower-profile Apache projects as well, and there’s an entirely separate edition that was developed for VMware environments. The pitch is, in case of a drive failure a cluster powered by the latter version is able to recover and get back to business automatically. This addresses Hadoop’s single point of failure, something that Facebook has been looking into as well.
While we’re on the subject, the social media kingpin open-sourced AvatarNode this week. The company runs one of the largest big data deployments in the world and its engineering team came up with a pretty nifty way to avoid potential issues: a double NameNode that enables manual failover, nicknamed after a very popular title.
Hortonworks did make its big launch before the event, but that doesn’t mean there weren’t other updates revealed during the actual conference. VMware, for one, introduced a new configuration and management tool it calls Serengeti.
The project glues together code from Infochimps, Cloudera, MapR and others with the goal of making it easier to deploy Hadoop in the cloud. At the same time the virtualization giant also updated Spring with the addition of extended lib support, together security and other features.