UPDATED 15:00 EDT / APRIL 14 2016

NEWS

Can Hadoop get past enterprise-grade roadblocks? | #HS16Dublin

The keynote for day two of Hadoop Summit in Dublin, Ireland, exhibited a high level of technical information with speakers from Yahoo, BMC Software and Hewlett-Packard Enterprise (HPE). Each of the speakers spoke about their company’s work and contributions to the Hadoop ecosystem.

Pushing boundaries through open source

The first speaker was Sumeet Singh, senior director of products for cloud and Big Data platforms at Yahoo!, Inc. He spoke about using the Hadoop platform for many years and how the company has come to rely on it to help it push the boundaries of its capabilities. Singh discussed a number of open-source projects that included batch, compute, queries, Apache HBase and other open-source initiatives in which Yahoo participates and the advances that the company has made.

Outlining the workspace, Singh spoke about a project that collapsed three clusters to make space on the platform and create what Singh deemed the largest, most modern cluster. As a result, Yahoo! saw its total cost of ownership decrease by 40 percent.

The projects

According to Singh, “What came out of the effort was a framework we call CaffeOnSpark, a phenomenal framework to advance deep learning on existing Hadoop or Spark clusters.” He also explained that CaffeOnSpark also enables the conversion of existing Hadoop and Spark clusters on to a very powerful platform for deep learning without the need to set up a separate cluster or move data back and forth between these clusters.

The platform provides server-to-server direct communication that speeds up learning and offers the ability to fully distribute the learning without scalability issues. CaffeOnSpark also supports incremental learning that occurs on top of saved models. The open-source project received an Apache license last month.

Singh ran through a number of open-source projects that included batch compute, queries, Apache HBase and other open-source initiatives in which Yahoo! participates, and he also talked about the advances that the company has made with these projects.

Moving forward, Singh sees four areas of opportunity in the Hadoop ecosystem: large-scale machine learning, deep learning, a quest for speed to fight for latency and more efficient cluster operations. He also noted that scale is something Hadoop needs to improve.

Use-case panel

Herb Cunitz, president at Hortonworks, Inc., next conducted a use-case panel, which shared many of the experiences of enterprise users, including how each customer is gaining value by using the Hadoop platform. Some of the types of projects included connected data platform collective data from smart meters, machine learning and sophisticated analytics

Helping the enterprise achieve the highest level of automation  

Joe Goldberg, solutions marketing consultant at BMC Software, Inc., was on hand to talk about what he called the “backroom stuff,” the behind the scenes activities that enable customers to get more from their data. He referenced that some of the more popular use-cases for Hadoop and Big Data are things like Extract, Transform and Load (ETL) and enterprise warehouse data modernization.

The customer

Goldberg talked about a platform approach to managing batch. “One of the most important characteristics — as Big Data and Hadoop applications are moving toward the enterprise — is that you have the ability to manage all of your batch processing in a consistent way,” he said. “A single way to visualize and manage across that diversity.”

What he hears from customers is that when they are moving Hadoop and Big Data applications into an enterprise context, there is a lot of complex traditional technology that already exists and they want to be able to manage and see the relationships and how all of that processing is coming together.

Goldberg said it is necessary to abstract and elevate how you manage batch processing so that you don’t look at the individual technologies but you can look at it from a business perspective. “You still need that deep technical detail and you need to be able to drill down and see all that information, but you want to stay at a high level from a management perspective,” he said.

The conversations

This summit and the conversations around Big Data have been about bringing the Hadoop ecosystem and Big Data applications to the enterprise, so you also need enterprise rigor. Things like version management and auditing and reporting and security,” Goldberg explained.

He remarked that when the industry talks about Hadoop and Big Data applications, it is from the data science and developer’s perspective. However, he mentioned that as the technology moves to business users, it is important to view how non-technical business users are going to access that technology. And self-service becomes a critical component.

The platform

Goldberg commented, “In the Hadoop ecosystem today, you can have an application or a piece of technology that will not be flexible and not allow you to change and integrate and absorb new technologies.” He discussed the need for a platform to be adaptable and extendable.

He provided use-cases for BMC clients in the airline, banking, entertainment and truck manufacturing industries. He noted that in order to accelerate ROI, the enterprise needs to leverage this platform.

Challenges realizing the data lake

After a video of an HP and DreamWorks use-case, Steve Sarsfield, product marketing manager at HPE, took to the stage and said, “Enterprise-grade Hadoop has enterprise-grade problems.”

He went on to discuss the three types of data that Hewlett Packard looks at: business data, machine data (IoT) and human data (facial recognition data), noting that the vision of Hadoop put all this data into the data lake. The problem, according to Sarsfield, is that the data remains in silos. He stated that there is a need to break away from doing things in different clusters and move away from silos.

The problems

Sarsfield laid out four issues for enterprise-grade Hadoop. First, it is hard to get mature analytics capabilities; secondly, there is a need for specialized skills and software needs to be easy for end users. The next problem is architectural limitations when running complex workloads. And, finally, there are security challenges.

Solutions

Hewlett Packard and Hortonworks are partnering to look at hardware configuration. In the past, it was a symmetrical problem. Now, through the collaboration, the problem is being addressed in an asymmetrical way of configuring hardware through Big Data reference architectures using HPE’s new Apollo platform, where compute and storage can be separated.

The partnership has paired the HP Vertica engine with the Hortonworks and Hadoop platforms to provide advanced analytics capabilities that are ACID compliant and 100 percent ANSI C compliant.

Hewlett Packard’s acquisition of Voltage Security is used to provide security for data on Hadoop servers, data going through a company’s Hadoop network, and data in use.

Watch the full keynote below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Hadoop Summit 2016 – Dublin. And make sure to join in during theCUBE’s live coverage during the event by joining in on CrowdChat.

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU