UPDATED 00:15 EDT / JUNE 25 2013

NEWS

Expect a Ripping Good YARN at Hadoop Summit

Hadoop Summit 2013 kicks off tomorrow and expect YARN to be a major topic of conversation. Three years in the making, YARN is essentially a new operating system for Hadoop that will allow the open source Big Data framework to break free from the shackles of MapReduce.

YARN is a new operating system for Hadoop enabling multiple applications.

Perhaps that was a bit harsh towards MapReduce. As anyone following Big Data knows, Hadoop, was originally developed at Yahoo! to search and index the web. It is an extremely powerful framework, without which it would be a lot harder to find what you’re looking for online today. But Hadoop essentially was and still is a “one application platform” supported by a single computing paradigm – you guessed it – MapReduce.

MapReduce is the main mechanism for manipulating data in HDFS. This is great if you’re trying to process and analyze huge volumes of data – think years worth of log files or other semi-structured data – but less than ideal for other types of data analysis.

To evolve Hadoop into a more versatile Big Data platform, Arun Murthy, then of Yahoo, set about re-architecting Hadoop three years back. The result, making its debut this summer, is Apache YARN. Murthy, who went on to co-found Hortonworks, describes YARN this way:

When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets. And do so in a way where multiple types of applications can operate efficiently and predictably within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.

So what are some of the other types of applications Murthy is referring to? Among them are machine learning, graph analysis, streaming analysis and interactive query capabilities. Once YARN is fully operational, developers will be able to manipulate data stored in HDFS with these types applications via the YARN ‘operating system.’

Now you may be thinking, can’t Hadoop already support these types of applications? Yes and no. Hive was developed by Facebook to serve as a SQL-style data warehouse layer on top of HDFS, but under the covers it still processes data via MapReduce. It also consumes a lot of resources, potentially impacting other jobs running (or at least trying to run) at the same time. Other Hadoop-related sub-projects for analyzing data operate in a similar way.

Which brings us to why YARN is so important. YARN is as a true Hadoop resource manager, allowing multiple applications – MapReduce, SQL, streaming analysis, etc. – to run on a single cluster of machines simultaneously while maintaining high performance levels. With YARN Hadoop is a true multi-application platform that can serve an entire enterprise.

This means Hadoop can be used as the foundation of an enterprise data management architecture, storing all of an enterprise’s data and being utilized as a shared data service. With YARN, the marketing team can run SQL-style applications while the Data Science team churns through petabytes of data, all on a single Hadoop deployment.

There’s still a ways to go before YARN is ready for production deployment, but it will certainly be the topic of many conversations tomorrow in San Jose. theCUBE kicks off live coverage of Hadoop Summit 2013 at 10:30am PT on Wednesday, and rest assured we’ll be covering the developments as they relate to YARN. Murthy himself joins us at 11:20am PT to provide us the latest details. Catch all the action at SiliconANGLE.com.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Expect a Ripping Good YARN at Hadoop Summit

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Expect a Ripping Good YARN at Hadoop Summit

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026