Kafka alternative Apache Pulsar gains top-level project status
Apache Kafka is finally getting some serious competition.
Apache Pulsar, a distributed messaging platform originally developed at Yahoo! Inc. and open-sourced two years ago, was designated a top-level project by the Apache Software Foundation on Tuesday. The foundation bestows that designation when a technology has acquired a sufficient community of developers and users as well as a governance structure that indicates it’s mature enough to be self-sustaining.
Like Kafka, Pulsar is a scalable, low-latency messaging platform that runs on commodity hardware and provides both publish-and-subscribe and queue semantics. Publish-and-subscribe is a favored technique for building streaming data applications because it enables programs to subscribe to specific data streams and filter out a deluge of irrelevant data. Queuing delivers messages to individual subscribers only.
Yahoo! created Pulsar as a multitenant messaging system that operates at very high speed. The company said it has run Pulsar in production for more than three years, processing millions of messages per second across millions of topics for Yahoo! Mail, Yahoo! Finance, Yahoo! Sports, Flickr, the Gemini Ads Platform and the Sherpa distributed key value store.
Designation as a top-level project will give Pulsar additional momentum in recruiting developers, said Matteo Merli, co-founder of startup Streamlio Inc., which sells a real-time analytics suite that incorporates Pulsar. Merli was the original lead developer of Pulsar while at Yahoo! and was recently named vice president of the Pulsar project.
“Kafka has a much larger community; there’s no way to deny that,” he said, but he added that he expects Pulsar to quickly reach parity. “Being an incubator was a big question for most of the potential users of the project,” he said. “Now this clarifies that the project is ready for prime time.”
The Pulsar architecture separates serving and storage layers using Apache BookKeeper as the storage component, resulting in what developers call “a vastly simplified approach to cluster operations” that enables cluster sizes to be adjusted easily and failed nodes replaced without disrupting streams. Pulsar can run on everything from bare-metal machines to Kubernetes clusters both on-premises and in the cloud.
Version 2.0, which was released in May, added a schema registry for better database integration and a lightweight computing framework that enables developers to connect directly with topics, which are named channels for transmitting messages from producers to consumers. Pulsar also has a compatibility wrapper that enables it to work seamlessly with Kafka applications.
In reality, open-source projects don’t compete with each other outside of some good-natured kibitzing by developers. And though Pulsar and Kafka provide similar functionality, “the origins are very different,” Merli said. “We come from the world of database replication and highly scalable systems with strong guarantees, whereas Kafka comes from log collection.”
One area in which Pulsar had an early lead on Kafka was in “exactly once” message delivery, which ensures that every message produced at one end of a Kafka chain is successfully committed at the other. Kafka didn’t get that capability until about a year ago. A similar concept called “effectively once” that uses deduplication has been baked into Pulsar from the beginning. “It’s very lightweight with little overhead,” Merli said.
Photo: Pixabay
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU