UPDATED 11:11 EDT / NOVEMBER 04 2011

NEWS

5 Big Data Tools Built On Hadoop

Yesterday I looked at several of the alternatives to Apache Hadoop that are coming from companies like HPCC Systems, Twitter and Microsoft. These projects differentiate themselves from Hadoop by providing a more robust set of integrated tools and/or more accessible ways of performing analysis. But Hadoop has a large ecosystem, with many projects being built upon Hadoop. These projects plug many of the same holes that Hadoop alternatives try to fill.

Apache Mahout

Apache Mahout

Apache Mahout is a Java library of machine learning and data mining algorithms, many of which (but not all) are designed to run on Hadoop. The algorithms are designed to be highly scalable – a requirement doing data mining on big data sets distributed on Hadoop clusters. The algorithms are categorized into three main use cases: recommendation mining, clustering, classification and frequent itemset mining.

GoldenOrb

GoldenOrb logo

GoldenOrb is an open source graph database built on Hadoop and based on Google’s Pregel paper. It’s a fitting extension to Hadoop, since Hadoop is based on Google’s MapReduce, BigTable and Google Filesystem papers. The project is sponsored by Ravel.

A graph data base is designed to explore the network of relationships between items in a data base – like a the relationships between people in a social network, for instance. GoldenOrb is in early development now, but could eventually be used for social graph analysis, data mining, fraud detection and more.

Datameer Analytics Solution

Datameer

Datameer Analytics Solution is a business intelligence and data visualization application built on Apache Hadoop. It’s one of several products that are attempting to make Hadoop more easily accessible to non-developers (see also Karmasphere). Datameer provides wizards for setting up data integrations and a spreadsheet style interface for working with data and creating visualizations. It supports multiple Hadoop distributions, including those from Cloudera and MapR.

WibiData

I wrote about WibiData from Odiago yesterday. It’s a data management and analytics product from a new startup launched by the founder of Cloudera.

HStreaming

hstreaming

One of Hadoop’s noted weaknesses is its lack of support for real-time analytics. Hadoop is engineered to do finite batch jobs, not never ending jobs on ever changing data. HStreaming is one of a few projects that addresses this. HStreaming offers an on-premise Enterprise Edition and Cloud Edition which runs on Amazon Web Services.

Services Angle

Doing big data analysis with Hadoop doesn’t end with the . The ecosystem of tools that either build upon or extend Hadoop (such as Hive) and make it more accessible are Hadoop’s greatest strength, and something projects like HPCC Systems and Spark can’t yet match. Database, enterprise data warehouse and business intelligence companies are all tripping over themselves trying to provide integration with Hadoop, with even Microsoft and Oracle jumping in.

Next week the SiliconAngle team will be at the HadoopWorld event in New York City. It’s completely sold out, but we’ll be covering the action live on our online show theCube.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU