UPDATED 21:01 EDT / JULY 01 2020

pay-937884_1280 BIG DATA

Confluent brings infinite data retention to Apache Kafka

Apache Kafka startup Confluent Inc. today debuted a new infinite data retention feature within its Confluent Cloud platform that enables customers to store as much related information as they want.

Apache Kafka is a popular open-source technology that’s widely used by companies to store event data, which is essentially information that’s generated by applications as they’re being used. Kafka provides a way to store these streams of data and direct them to downstream repositories such as data lakes and data warehouses, where they can be analyzed.

Confluent, which was founded by the creators of Kafka, said the infinite data retention feature is part of its ongoing “Project Metamorphosis,” which is aimed at taking Kafka’s event streaming capabilities and transforming them into a service that can support business operations at scale between servers in on-premises data centers and the cloud. The feature will be rolled out to Confluent Cloud users on Amazon Web Services next month and on other public cloud platforms later this year.

Most Kafka users have created enormous clusters that store petabytes of event data that can be used for either real-time or historical analysis. But Confluent says the tight integration of storage and compute within Kafka makes it extremely expensive for companies to store all of this data.

It’s for that reason that Confluent is separating compute and storage with its infinite data retention feature, enabling users to scale up storage independently.

“We’ve removed the limitations of storing and retaining events in Apache Kafka with infinite retention in Confluent Cloud,” said Jay Kreps, co-founder and chief executive officer at Confluent. “With event streaming as a business’s central nervous system, applications can pull from an unlimited source of past and present data to quickly become smarter, faster, and more precise.”

Confluent has effectively removed a cap on how much data can be stored and for how long, within its Confluent Cloud, which is a hosted service available on public cloud platforms such as AWS.

The new capability helps to eliminate the technical and economic strain faced by companies that need to deal with rapidly growing volumes of data from their real-time event streams. Customers can now establish a cost-effective “central nervous system” for data events that will unlock more uses cases while mitigating the growing costs of Kafka storage, the company said. Those use cases include such things as creating a log of events for compliance audits, training machine learning models based on event data, and improving the accuracy of recommendation engines.

In a blog post, Confluent software engineer Jun Rao provided a few more examples, saying that a retail bank could use infinite data retention to let customers search the full history of their transactions online, instead of just the last six months as is usually the case.

“This can be done by first integrating all transactional events in Kafka and then feeding the events from Kafka into a search engine like Elasticsearch,” Rao said. “As new transactional events are added in Kafka, they will be incrementally reflected in the Elasticsearch index in real time.”

Constellation Research Inc. analyst Holger Mueller, author of a report called “Infinite Platforms Power Enterprise Acceleration,” told SiliconANGLE that we are now living in an era of infinite computing, in which computing resources have essentially become infinite.

He said that providing infinite storage for something like Kafka is at first glance an odd combination because a streaming service, by its very nature, shouldn’t really need it.

“But the truth is that many of the other relevant infinite computing layers, like infinite insights and infinite artificial intelligence, require complete storage of both streaming and more residential data to power next-generation applications,” Mueller said. “It’s good to see Confluent making it easier for enterprises to persist their streaming data as long as they need.”

Mueller said the next step for Confluent is to extend the feature beyond AWS to other major cloud providers, something the company has promised will happen later this year. “When that happens, Confluence Cloud by itself will become an infinite compute platform for enterprises, as it will bridge compute capabilities across clouds,” he said.

Image: geralt/Pixabay

Since you’re here …

Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!

Support our mission:    >>>>>>  SUBSCRIBE NOW >>>>>>  to our YouTube channel.

… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.