UPDATED 09:26 EDT / FEBRUARY 21 2014

Who’s talking about which Big Data frameworks?

uncertain road path street question mark predictive analytics direction future

Editor’s Note: this article has been updated for accuracy and context.

As Hadoop enables an ecosystem for Big Data to enter the enterprise, a handful of frameworks have bridged the gap between open source and business needs. On the lookout for which Big Data frameworks are getting buzz, we look to last week’s BigDataSV event, asking industry leaders and analysts what they’re seeing in the market. Three that were mentioned most include HBase, MapR, and YARN. All three attack Big Data applications to support mission-critical applications for today’s enterprise. Talking to our #techathletes on theCUBE, we looked at some of the top solutions for deploying Big Data in the enterprise, learning directly from those on the front lines.

HBase

 .

WANdisco on HBase

WANdisco CEO David Richards and Jagane Sundar, CTO and VP of Engineering of Big Data joined theCUBE to discuss the company’s latest developments and the current Big Data trends at #BigDataSV earlier this month. WANdisco released this month a new Non-Stop Hadoop product, a single cluster running HBase that can be deployed across multiple data centers spread over different regions. Commenting on HBase adoption, Richards mentioned a 30 percent deployment among customers. It is mostly used for stock feeds, twitter streams, streaming real time apps.

“We’re seeing great desire for CIOs to do whole sale replacement their technology. Analyzing the market is difficult, it’s really tough. One of the proxies for Big Data is hard drive manufacturers, who are up 15 percent. Where Big Data adoption in real production environments is concerned, much like Splunk’s success, players such as Hortonworks and Cloudera will take over the market, even if it’s still dominated by the big whales. What we’re seeing, what I expect to see, companies that traditionally invest in public companies have to move down the stack and invest in private companies.”

Brett Rudenstein, Senior Product Manager of Big Data for WANdisco added to to Richards comments across the street at day 2 of the Strata Conf. 2014 in Santa Clara, California. In WANdisco’s opinion, HBase is a slam dunk.

“HBase is effectively a storage for big beta applications; some people call it a key value store, but the fundamental principle behind it is being able to store billions and billions of rows of data and, in the same time, have (near) real-time access to that data. From a database perspective, the reason that it’s often picked is because of the level of scale that it’s able to achieve and also because it is fundamentally a Hadoop database. Because HBase stores its log files into HDFS, the first thing that you need is a hardened HDFS whereby you can withstand failure,” answered Rudenstein.

MapR

 .

MapR can do speed and efficiency, but more focused on problem solving

MapReduce allows you to process Big Data in a distributed and parallel manner. Jack Norris, MapR CMO, joined John Furrier and Dave Vellante on theCUBE during our same coverage of the 2014 Strata Conference. Norris says MapR maintains “a truly focused business model,” providing innovations and advantages that benefit customers’ bottom line. He notes that Cisco has focused on how they can best leverage the data and are now dramatically expanding their use cases and how to derive value from data. Norris suggests, “sometimes its leveraging new data sources, sometimes its leveraging the data sources that they have available.”

“Open source may imply a singular business model, which caused some initial confusion. Still, I believe MapR’s hybrid model has proven it’s uniqueness and efficiency. It’s easier for folks to get to much faster — there’s been a pretty fast and broad acceptance of that with enterprise customers.”

HP Vertica is not only on board, but it believes that analytics should be embedded in everything. Colin Mahony (VP & GM of HP Vertica) and John Schroeder (CEO of MapR) joined us on theCUBE at #BigDataSV and the coversation broke almost immediately. A CUBE alumn, Mahony spoke as to why HP Vertica chose MapR:

“We are really excited about our relationship with MapR; we’re combining two great solutions so that customers who want to take advantage of big data (or any data), can do it seamlessly. What Vertica brings to the table, is an incredible MPP SQL analytics platform, but when you think about the big data lake, it just makes sense that you can have a single environment where you can do anything you want against the data. Like with most great partnerships, it’s really customer-driven.”

So what is the state of Big Data as an industry? Schroeder commented:

“If you look specifically at Hadoop, it’s settling down to a couple of platform providers, and we’re the leader there, but I don’t think it’s ready to vertically integrate the stack.”

YARN

 .

YARN matures, set to drive next generation

One YARN deployment gains perspective from the partnership between Hortonworks that has Microsoft that goes back 18 months. Hortonworks focuses on making Hadoop great, and Microsoft focuses on helping its customers get data out of Hadoop and deliver it to their end users. There has been a lot of conversation last week around YARN, so theCUBE host John Furrier asked Eron Kelly, GM Product Marketing – Data Platform at Microsoft and John Kreisa, VP of Strategic Marketing at Hortonworks directly about YARN. We wanted to get both Kelly and Kreisa’s temperature on where YARN stands right now.

John Kreisa said, “YARN is a maturing technology, its out in Hadoop 2.0 and now in Hadoop 2.2 that Microsoft is bringing in and of course Hortonworks data platform really driving the next generation. It allows different technologies to integrate natively and use the resources within the cluster more effectively. Eron talked about the fact we’re seeing 40-50 percent higher performance on things like queries, which is related to the Stinger project, but also overall platform and cluster utilization. We’re seeing big enterprises be able to reduce in some case the number of nodes they have to use to run the same workload. It’s a very efficient framework within Hadoop.”

Microsoft has been adamant that its going to bring Big Data to 1 billion users, and in order to do so YARN is going to be a big part of that. When asked if he wanted to back off that statement he made 105 days ago, Kelly said:

“The strategy and vision statement still holds and in fact we’re just really building momentum towards that. With the release of Power BI on Monday it does make it really really easy for any user to get access to data on Hadoop and start to do analysis.”

He went on to provide a use case: The City of Barcelona is using Power BI to collect Twitter sentiment to measure, connect, and correlate its Twitter sentiment for citizens based on festivals with the availability of different resources like buses being on time. It’s already working too. Recently, there was a concert in Barcelona that ended at 2:00am. People went to the bus stop to catch a bus home and the buses weren’t there. Those people started tweeting how they were angry because the buses weren’t there and the city of Barcelona was able to catch that sentiment and make a decision based on it to reroute buses back to them.

photo credit: milos milosevic via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU