UPDATED 12:13 EDT / NOVEMBER 03 2011

no Hadoop NEWS

3+ Alternatives to Apache Hadoop

no Hadoop Next week the SiliconAngle team is heading to the HadoopWorld event in New York City. We’ll be broadcasting theCube live and covering all the latest developments in the Apache Hadoop ecosystem. But it’s important to remember that Hadoop isn’t the only game in town. As we ramp up our coverage of Hadoop in advance of the event, here are some other big data projects to keep in mind.

Update: I just wrote about another alternative: Spark.

HPCC Systems

The most obvious and direct competitor to Hadoop is HPCC Systems, an open source spin-off from LexisNexis Risk Solutions. Like Hadoop, HPCC is a system for building clusters of commodity servers to perform analysis of large data sets. HPCC has been in use internally at LexisNexis for many years and has been used for data analysis by scientific researchers outside of LexisNexis. It’s a mature and robust system with its own stack of tools, including a high level programming language called ECL and data warehousing tools.

What it doesn’t have yet is a developer ecosystem on par with Hadoop. It’s hard enough to find Hadoop talent, it will be even harder to find people with any experience in this previously tightly controlled technology. On the other hand, the HPCC stack may provide a more turnkey solution and ECL should be easier to get started with. And HPCC has a great open source licensing model. Considering all these factors, HPCC could gain steam.

Twitter/Backtype Storm

Storm, developed at Backtype before it was acquired by Twitter, is billed as the “Hadoop of realtime processing.” Storm is engineered to analyze near real-time, streaming data sources – like the Twitter firehose. Historically, Hadoop has been best used for analyzing big data sets rather than quickly updated streams of data. Hadoop is for running a job with a set end point, Storm is for processing jobs that are continuous because new data is constantly being added.

Storm has a compelling value, but as Forrester analyst James G. Kobielus told SiliconAngle/ServicesAngle editor Alex Williams, tools being built on top of Hadoop, such as HStreaming are making streaming data processing more feasible. Storm will face competition there, as well as from complex event processing (CEP) companies like Progress Software.

Microsoft Azure Table, Project Daytona and LINQ

Microsoft actually has three potential Hadoop alternatives: Azure Table, LINQ to HPC and Azure Project Daytona (a Microsoft Research project).

Azure Table Storage is an BigTable/Hbase-like service offered in Microsoft’s cloud. It’s focused on providing an alternative to the data store component of Hadoop, but not the full analytics system.

LINQ to HPC (formerly called Dryad) is a beta feature built for building clusters of Windows HPC Server. It enables users to build clusters of commodity servers and use Microsoft’s own programming language to perform analytics on large unstructured data sets, much like MapReduce/Hadoop. It’s been used at Microsoft to power data mining for years.

While LINQ to HPC uses Microsoft’s own LINQ language to handle analysis, Project Daytona, like Hadoop, is based on MapReduce. Notably, it includes the ability to be run as as service, with many ready made algorithms. It can even be delivered as a service via Excel, providing a means to give less technical users access to the system. It’s still a Microsoft Research project, and the company is currently targeting scientists who want an easy to use analytics system.

On the one hand, Microsoft is pushing out these competing products and services, but on the other Microsoft has also rolled out a Hadoop connector for Microsoft SQL Server and is working with HortonWorks on its own Hadoop distribution.

Services Angle

There’s quite a bit of in-fighting in the Hadoop community between the various competing companies working on commercializing Hadoop. That creates an opening for the competition to step in.

HPCC Systems is currently working to bring its platform to Amazon Web Services, where it will run along side Amazon Elastic MapReduce, the Hadoop service from AWS. Microsoft is already pushing some of its own hosted Hadoop alternatives. Watch for these sorts of services to be the next battleground for big data analytics.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU