Hortonworks’ YARN Aims to Revolutionize Hadoop Data Processing

Hortonworks new logoHortonworks is publishing a series of blog posts on its website that explain the basics and finer details of Apache Hadoop YARN. Those who are curious about YARN or want to understand its significance to Hadoop will find the blog posts beneficial. Furthermore, it should be of benefit to those who might not have even heard of YARN but who are looking for alternatives to the classic Apache Hadoop MapReduce framework.

The first post gives a background and overview of MapReduce: what it is supposed to accomplish, how people have used it, and ultimately how YARN aims to improve upon it. According to Hortonworks, Apache Hadoop MapReduce is in need of an overhaul, most notably with the JobTracker. YARN addresses issues users have faced by separating JobTracker (resource management) and job scheduling/monitoring.

According to some, Apache Hadoop is not ready for enterprise deployment because there is no method of processing data other than MapReduce. For batch data processing, MadReduce is an ideal choice, but for many, the non-batch functions, such as real-time data processing and graph processing need more than what classical MapReduce can provide.

YARN, as described by Hortonworks in post 2, is a system for managing distributed applications. The components include a ResourceManager and a NodeManager. It also includes an ApplicationMaster, a tool for coordinating resources between the ResourceManager and the NodeManager.

In post 3, Hortonworks delves into more details regarding ResourceManager, complete with diagrams breaking down the role of the component and how it interacts with others. Most of this post gets into very technical terms, but it will be helpful for those trying to understand the underlying technology behind YARN.

Hortonworks has made significant strides in the big data arena and is one of the top contributors to Apache Hadoop. It is one of several companies, including Cloudera, vying for a top seat in the Hadoop development sphere. YARN, which has been called the Next-Generation MapReduce, could help Hortonworks reach a larger segment of the big data market, including those reluctant to try Hadoop in the past.