UPDATED 11:39 EST / NOVEMBER 16 2010

A Tutorial for Hadoop and Map Reduce in Java

Hadoop has become an extremely big name here at SiliconANGLE, being one of the premiere open source cloud-storage and -computing projects. If you’re a Java developer and you haven’t had a chance to take a test drive with it, there’s a very easy tutorial up by Carlo Scarioni covering Hadoop basics.

Hadoop is an open source project for processing large datasets in parallel with the use of low level commodity machines.
Hadoop is build on two main parts: a special file system called Hadoop Distributed File System (HDFS) and the Map Reduce Framework.
The HDFS File System is an optimized file system for distributed processing of very large datasets on commodity hardware.
The Map Reduce framework works in two main phases to process the data. Which are the Map phase and the Reduce phase.

The tutorial shows a developer where to download the source files from Apache, how to unpack the helper executables, and provides a small set of Java code.

The code implements a dictionary translation by taking a series of compiled dictionaries (English-Spanish, English-Italian, English-French) and then outputs a single dictionary that displays the English word followed by every translation. Under normal circumstances, the could would start with an English word and then search every file for each instance. Hadoop speeds this up by distributing the file and processing.

The code uses a cloud-storage mechanism to speed up the hash mapping of the various dictionaries, but it does not use cloud-processing to accelerate itself. Since this is only a basic tutorial series, Carlo mentions that he’ll hit that up later.

So, if you know Java and want to play around with Hadoop, here is an excellent place to begin.

Also, it’s a good way to get an understanding of how this framework can give you a jumpstart on the cloud computing revolution.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU