There is a lot of pressure on open source to deliver a viable platform for the enterprise. As we learned at the recently concluded Hadoop Summit (#hadoopsummit), Hortonworks believes that Hadoop + YARN are going to be very big pieces to solving that puzzle. As Hortonworks Arun Murthy, Founder and Architect, said, “YARN allows you to interact with data in ways that were never possible before.”
YARN turns Hadoop from a single application system to a multi-application operating system. With YARN, the Hadoop community has the ability to run SQL in Hadoop. By turning Apache Hadoop 2.0 into a multi application data system, YARN enables the Hadoop community to address a generation of new requirements IN Hadoop.
The big vision for MapR and Hadoop is one platform for Big Data.
Apache Hadoop is an open-source software framework. Hadoop + YARN is a key to Big Data platforms in the Enterprise space for open source. Open source, in general, is the single most important avenue of success for Hadoop. There are many changes taking place in the datacenter as companies like IBM, HP, Dell re-shift strategies around hyperscale efficiency and services.
The industry is rallying around open source. Here is a collection of those recent rally cries as they apply to Big Data platforms in the enterprise:
Hortonworks’ YARN Aims to Revolutionize Hadoop Data Processing
YARN is a system for managing distributed applications. The components include a ResourceManager and a NodeManager. It also includes an ApplicationMaster, a tool for coordinating resources between the ResourceManager and the NodeManager.
YARN is as a true Hadoop resource manager, allowing multiple applications – MapReduce, SQL, streaming analysis, etc. – to run on a single cluster of machines simultaneously while maintaining high performance levels. With YARN Hadoop is a true multi-application platform that can serve an entire enterprise. This means Hadoop can be used as the foundation of an enterprise data management architecture, storing all of an enterprise’s data and being utilized as a shared data service. With YARN, the marketing team can run SQL-style applications while the Data Science team churns through petabytes of data, all on a single Hadoop deployment.
Big Data Is Not Something That’s Only On a Certain Platform, says IBM’s Anjul Bhambhri #hadoopsummit
Anjul Bhambhri, Vice President, Big Data and Streams, IBM, discussed the company’s latest release, Big Insights, along with its overall Big Data strategy with theCUBE co-hosts Jeff Kelly and Dave Vellante, live at the Hadoop Summit 2013. “We have brought SQL to Hadoop, we are now providing wire SQL update capabilities.” Another feature is bringing the processing of unstructured data to Hadoop. From a performance standpoint, “enterprise customers will be wanting, as they are using SQL, the same performance as using SQL to structured data,” she explained. “We are doing that by not moving data out of Hadoop on any other database on the side. We are able to run SQL inside Hadoop.”
The MapR Vision for Hadoop: One Platform for Big Data | #hadoopsummit
MapR is a company that focuses its activity on making Hadoop enterprise-ready. Apache Hadoop is an open source framework designed to run applications on large clusters of commodity hardware and to solve problems with inherent volumes and variety of data. MapR provides a complete distribution for Hadoop, without being affiliated with the Apache Software Foundation (ASF). In the interview with Tomer Shiran, Vice President of Product Management with MapR — Shiran discusses Enterprise Data Architecture, Deployment and Operations, and the future of Apache Hadoop.
Hadoop + YARN : Key to Big Data Platforms in Enterprise | #hadoopsummit
Arun Murthy, Founder and Architect, Hortonworks, discussed YARN and Hadoop as a viable solution for enterprises. On #theCUBE, Murthy stated that Hadoop and YARN are going to be “a big, big piece of the puzzle,” as “YARN allows you to interact with data in ways that were never possible before.” Talking about their customer pool, Murthy said it was spread across different industries and levels of savviness. In many cases, enterprises have 7-8 implementations of Hadoop, and Hortonworks gets called to then help them understand the security, compliance, and auditing aspects and how to run their applications on it.
To Succeed with Hadoop: Find Specific Problem Areas + Solve Them | #hadoopsummit
John Furrier and Dave Vellante interviewed Stefan Groschupf, CEO and Founder of Datameer and an early contributor to Hadoopto come on #theCUBE at the #hadoopsummit. Talking about the possibility of SQL coming to Hadoop, Groschupf said: “The Hadoop file system actually works like a tapedrive. Technically, Hadoop is a sequential optimized file system. To find something, you have to stream all the data, and that offers its performance for analytical workloads.” Adding varied languages on top of a sequential optimized file system does not make sense, Groschupf thinks. The only market advantage nowadays is moving faster than your competition, so it basically comes down to agility and flexibility.
Hadoop 2.0 + YARN Will Enable New Workloads : Hadoop summit 2013 Review
John Kreisa, VP Strategic Marketing at Hortonworks, joined the guys on #theCUBE as the last interview. “Hadoop 2.0 is going to be driven around the YARN architecture, and YARN is going to enable a bunch of other workloads, opening the platform for a broadening of how it’s used in the Enterprise,” said Kreisa. Co-host John Furrier found three major areas of focus from #hadoopsummit 2013:
1. Engineering – people are looking more and more for engineering talent, platforms are under mass construction and there’s a lot of deman for the developers who can make the platforms 100% open and robust.
2. Growth – there’s a lot of field salesforce and consultancy going on
3. Partnership – both in the community of the coders, but also on the commercial side.
Get the whole collection
We’ve created a handy Springpad collection of these 7 stories for easy reference.