UPDATED 07:10 EDT / FEBRUARY 26 2015

What eBay’s latest open-source project means for Hadoop

ebay logoThe last few months have seen eBay Inc. move from the sidelines of the open-source analytics ecosystem into the heart of the action with the introduction of two projects that push the envelope on large-scale data science. The pivot mirrors a broader shift in the ecosystem that the most recent of the additions accelerates.

The newly open-sourced Pulsar is a complex event processing (CEP) platform that the auction powerhouse created to analyze data streams moving too fast for its Hadoop cluster to handle. It’s a mishmash of existing technologies that has been tailored to provide internal teams with the ability to quickly act on changes in users’ activity patterns.

At the core of the framework is a free execution engine from a little-known provider called EsperTech Inc. that handles the processing of incoming events. A copy of the software is deployed on each node and hooked up to an implementation of the Apache ZooKeeper coordination service such that data points sharing a particular identifier are routed to the same place.

Pulsar provides the ability to branch off new streams as many times as required, which allows for highly granular filtering of information. That’s essential to support the multi-dimensional analysis that eBay relies on to navigate through the billions of events that flow through its systems every second, the same requirement that spurred the development of the business intelligence engine it open-sourced a few months prior.

The two projects share many characteristics that reflect the highly specialized needs of the web giant, from the underlying architecture to a common approach toward exposing data that draws on the familiarity of business workers with legacy relational systems. But Pulsar diverges on Hadoop, which it substitutes entirely under the hood.

The decision is especially notable because eBay is not the first to have gone down that route. Spark Streaming and Storm, the two leading real-time analytics engines in the Apache ecosystem, can both run independently of the data-crunching framework. The LinkedIn-created Samza, which has also been gaining steam, still relies exclusively on the Hadoop resource manager for provisioning, but the community is working to remove that dependence.

The transition away from the project comes as the fundamental incompatibilities between Hadoop’s batch-oriented architecture and stream processing become more noticeable with the rise of more advanced applications for real-time analytics. The trend is confined to that use case for now, but with Google and Microsoft both working on full-blown Hadoop alternatives, it’s poised to engulf the entire ecosystem.

That increased competition will provide more freedom of choice for organizations in how they analyze real-time data and other modern workloads, but the added flexibility may come at a cost. The release of Pulsar brings the number of open-source event processing engines to have launched in recent years to four, variety that is unlikely to be sustained for the long term considering the significant overlap among the different options.

That’s an issue shared by the entire ecosystem. As organizations continue to refine their requirements and the technological fault lines become more pronounced, eBay’s late entry into the stream processing fray will face tough competition over the support of the community.  Consolidation is only a matter of time.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU