UPDATED 14:40 EDT / NOVEMBER 12 2014

LinkedIn’s latest open-source project supercharges Hadoop

big data elephant tusks zebra stripes hybridLinkedIn Inc. is releasing yet another internally developed framework for Hadoop under an open-source license in a bid to help organizations that can’t afford hiring an army of expensive specialists to fine-tune every detail to make the most of their analytic clusters. The project adds to the already formidable pile of community contributions that the web-scale crowd has racked up over the course of its journey to push the boundaries of large-scale data processing.

Hadoop itself was borne of that endless pursuit along with many of the complementary technologies in the surrounding ecosystem, including the most recent addition, an engine called Kylin that eBay Inc. developed to spare internal users long delays when digging for data in its massive deployment. The newly revealed Cubert framework from LinkedIn extends that vision beyond queries to the full gamut of operations in Hadoop, from organizing information for analysis to carrying out the processing.

Cubert implements the lessons that the social networking powerhouse learned when laying out the foundation for its XLNT engagement testing platform, which proved too taxing for existing Hadoop sub-projects to handle. After spending several months trying to make the tools they already had at their disposal work to little avail, LinkedIn’s engineers decided to build an entirely new system to bear the brunt of the complex data manipulations in XLNT.

The technology served its purpose, but the developers found themselves having to rewrite large portions of the underlying code in order to accommodate the new use cases that the success of the project drew over time. So they set out to come up with an answer to the requirements of XLNT for the third time, and thus Cubert was born.

Tackling all 3 levels of the analytic stack

 

The framework provides an engine for finding simple solutions to complex analytical problems that might normally prove too resource-intensive to solve within an allocated time frame. It cuts across all three levels of the analytic stack.

In the storage layer, Cubert uses a combination of abstractions over the Hadoop File System to organize data as blocks structured for the most efficient access possible. These partitions are manipulated with operators located one level higher up at the execution layer that automate tasks not directly supported in other platforms, such mapping out relationships between entities and calculating statistical positions. Finally, this functionality exposed to developers through a simplified syntax dubbed Cubert Script implemented at the top of the stack that makes it possible to to specify workload execution paths without writing any Java code.

That provides a relatively straightforward interface for optimizing data processing that LinkedIn says can help users accelerate analytics by up to 60 times. Cubert only works with the default MapReduce execution engine in Hadoop on launch, but the company plans to leverage the extensibility of the framework in order to add support for the exponentially faster Spark further down the road. More analytic functions and increased automation are in the works as well.

photo credit: Camil Tulcan via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.