Apache Kafka gets ‘exactly-once’ message delivery – and that’s a big deal
Apache Kafka has a reputation as a lightweight, fast and highly scalable message broker, which passes data between applications. But one feature has frustrated users over the five years since the platform graduated from its incubation period at LinkedIn Corp.: It was impossible to ensure that each and every message that was produced at one end of a Kafka chain was successfully committed at the other.
That’s called exactly-once delivery, and it’s devilishly hard to make work. But now a team at Confluent Inc. says it has licked the problem without affecting performance or imposing a lot of administrative overhead on developers. The solution is incorporated into the recently released 0.11 version of Kafka. At the Kafka Summit Confluent running on Monday in San Francisco, the fix is bound to be a major topic of discussion.
Kafka is used in distributed, asynchronous applications to transmit data from a producer, such as a sensor, to a destination or consumer, where it’s committed to a database or transaction log. The problem with distributed, asynchronous environments is that things can break. In a perfect world, the consumer acknowledges the transaction back to the producer after it has been committed. But if something goes wrong and that acknowledgment is never received, the producer has no way to know if the data was transmitted successfully.
The safe route
Kafka’s original developers opted to go the safe route and support “at-least-once delivery.” That means that if the producer fails to get an acknowledgment, it resends the message until it does. The benefit of that approach is that it ensures that the data is never lost. The downside is that it can create duplicate messages, which impacts performance and can even cause applications to crash if they haven’t been written to handle duplication.
“It’s a hard problem to solve in distributed systems,” said Neha Narkhede (pictured), co-founder and chief technology officer at Confluent. “Until now, you’ve had to make a choice between two options – building an application that can handle processing a message twice or risking that the message may not process at all.” The problem is so difficult to solve that some people have suggested that exactly-once delivery isn’t even possible.
But Confluent now thinks it has cracked the code. Its solution, which Narkhede documented in a detailed blog post, consists of two parts.
Borrowing a TCP technique for ensuring packet delivery, developers came up with a way to assign a unique sequence number to each message that’s deduplicated on the other end. Unlike TCP, the sequence number is committed to a Kafka log that any broker can read. Activating this feature is as simple as flicking a program switch, although some modifications are necessary on the consumer application side.
The second part of the solution was to create an application program interface that supports atomic writes across multiple partitions. “Atomicity guarantees that all messages are visible to the consumer or or none of them are visible,” Narkhede said. If all messages are visible, then the data was received and committed successfully.
Thoroughly tested
The solution was in development for about three years and went through nine months of public scrutiny and comment. Developers produced a detailed design document and benchmarked their solution against 15,000 lines of test code simulating nearly every possible failure scenario. So far, the exactly-once solution has held up and proved to have virtually no impact on performance, Narkhede said.
In a closed-loop system where all brokers see the same view, “this is like magic pixie dust,” she said. “You flick a switch and start getting exactly-once guarantees.” Open-ended systems using APIs require a little more tuning but can take advantage of the same delivery guarantees.
“Solving this problem opens up a lot of new applications,” said Narkhede, who will kick off Kafka Summit Monday in an opening keynote. For example, credit card processors need exactly-once semantics to ensure that charges are processed only once. Ad-tech companies need the capability to ensure that click-through rates are accurate. With exactly-once semantics in Kafka, they now have an open-source option for doing so.
Image: Twitter
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU