UPDATED 10:00 EST / DECEMBER 23 2014

HP thinks it’s got a better way to run Hadoop | #HPdiscover

Steve Tramack, HPRunning Hadoop on converged infrastructure is not a particularly attractive proposition. The massive workloads that the data-crunching framework is designed to handle require an entirely different ratio of compute and storage resources than the typical enterprise application demands. That balance can be all but impossible to cost-effectively address when the two come in a single box that only scales horizontally.

Or at least, it used to be. In a recent appearance on theCUBE from HP’s Discover conference in Barcelona, HP Senior Engineering Manager Steve Tramack said his team has managed to overcome that limitation with a unique architectural approach that allows organizations to take advantage of the convenience of the Hadoop deployment model without compromising the efficiency of their analytics clusters. HP hopes it will be a game-changer amid the explosion of use cases for Hadoop.

“As organizations start to aggregate and assimilate data, they’re starting to see business value and all of a sudden they’re moving from batch to multiple workloads, and those workloads bring multiple copies of the same data and different requirements,” Tramack told theCUBE host Dave Vellante. To address the growing diversity of applications running on top of Hadoop, HP is redefining the supporting infrastructure.

The company recently unveiled a design for a converged system that takes advantage of new features in the latest version of Hadoop, namely the ability to define groups of nodes within a cluster and distinguish different storage types, which Tramack called an “asymmetric” environment. Instead of running the entire platform on the same infrastructure, each workload and component is deployed on the partition best suited to meet its specific requirements.

In HP’s reference architecture, the Hadoop File System is distributed across the storage servers of the different nodes while YARN and the other software handling the manipulation of data is deployed on flash-equipped Moonshoot systems packing 45 of Intel’s newest data center processors. “We’re using that for file system access, so we’re gaining the benefit of flash in a very cost-effective manner and we’re using spinning media for the primary data storage,” Tramack explained.

The concept isn’t new, he added. “These concepts are very similar to what’s in the Cray architecture, with its neat little compute blocks and storage blocks.” Unlike supercomputers, however, converged infrastructure ships one compact module at a time, which lowers the barrier to entry and enables organizations to be much more flexible in how they scale their environments while optimizing hardware use.

Watch the full interview (15:43)


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU