UPDATED 14:40 EDT / NOVEMBER 12 2014

LinkedIn’s latest open-source project supercharges Hadoop

big data elephant tusks zebra stripes hybrid LinkedIn Inc. is releasing yet another internally developed framework for Hadoop under an open-source license in a bid to help organizations that can’t afford hiring an army of expensive specialists to fine-tune every detail to make the most of their analytic clusters. The project adds to the already formidable pile of community contributions that the web-scale crowd has racked up over the course of its journey to push the boundaries of large-scale data processing.

Hadoop itself was borne of that endless pursuit along with many of the complementary technologies in the surrounding ecosystem, including the most recent addition, an engine called Kylin that eBay Inc. developed to spare internal users long delays when digging for data in its massive deployment. The newly revealed Cubert framework from LinkedIn extends that vision beyond queries to the full gamut of operations in Hadoop, from organizing information for analysis to carrying out the processing.

Cubert implements the lessons that the social networking powerhouse learned when laying out the foundation for its XLNT engagement testing platform, which proved too taxing for existing Hadoop sub-projects to handle. After spending several months trying to make the tools they already had at their disposal work to little avail, LinkedIn’s engineers decided to build an entirely new system to bear the brunt of the complex data manipulations in XLNT.

The technology served its purpose, but the developers found themselves having to rewrite large portions of the underlying code in order to accommodate the new use cases that the success of the project drew over time. So they set out to come up with an answer to the requirements of XLNT for the third time, and thus Cubert was born.

Tackling all 3 levels of the analytic stack

The framework provides an engine for finding simple solutions to complex analytical problems that might normally prove too resource-intensive to solve within an allocated time frame. It cuts across all three levels of the analytic stack.

In the storage layer, Cubert uses a combination of abstractions over the Hadoop File System to organize data as blocks structured for the most efficient access possible. These partitions are manipulated with operators located one level higher up at the execution layer that automate tasks not directly supported in other platforms, such mapping out relationships between entities and calculating statistical positions. Finally, this functionality exposed to developers through a simplified syntax dubbed Cubert Script implemented at the top of the stack that makes it possible to to specify workload execution paths without writing any Java code.

That provides a relatively straightforward interface for optimizing data processing that LinkedIn says can help users accelerate analytics by up to 60 times. Cubert only works with the default MapReduce execution engine in Hadoop on launch, but the company plans to leverage the extensibility of the framework in order to add support for the exponentially faster Spark further down the road. More analytic functions and increased automation are in the works as well.

photo credit: Camil Tulcan via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

LinkedIn’s latest open-source project supercharges Hadoop

Tackling all 3 levels of the analytic stack

photo credit: Camil Tulcan via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

LinkedIn’s latest open-source project supercharges Hadoop

Tackling all 3 levels of the analytic stack

photo credit: Camil Tulcan via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies