UPDATED 21:01 EDT / JULY 01 2020

BIG DATA

Confluent brings infinite data retention to Apache Kafka

Apache Kafka startup Confluent Inc. today debuted a new infinite data retention feature within its Confluent Cloud platform that enables customers to store as much related information as they want.

Apache Kafka is a popular open-source technology that’s widely used by companies to store event data, which is essentially information that’s generated by applications as they’re being used. Kafka provides a way to store these streams of data and direct them to downstream repositories such as data lakes and data warehouses, where they can be analyzed.

Confluent, which was founded by the creators of Kafka, said the infinite data retention feature is part of its ongoing “Project Metamorphosis,” which is aimed at taking Kafka’s event streaming capabilities and transforming them into a service that can support business operations at scale between servers in on-premises data centers and the cloud. The feature will be rolled out to Confluent Cloud users on Amazon Web Services next month and on other public cloud platforms later this year.

Most Kafka users have created enormous clusters that store petabytes of event data that can be used for either real-time or historical analysis. But Confluent says the tight integration of storage and compute within Kafka makes it extremely expensive for companies to store all of this data.

It’s for that reason that Confluent is separating compute and storage with its infinite data retention feature, enabling users to scale up storage independently.

“We’ve removed the limitations of storing and retaining events in Apache Kafka with infinite retention in Confluent Cloud,” said Jay Kreps, co-founder and chief executive officer at Confluent. “With event streaming as a business’s central nervous system, applications can pull from an unlimited source of past and present data to quickly become smarter, faster, and more precise.”

Confluent has effectively removed a cap on how much data can be stored and for how long, within its Confluent Cloud, which is a hosted service available on public cloud platforms such as AWS.

The new capability helps to eliminate the technical and economic strain faced by companies that need to deal with rapidly growing volumes of data from their real-time event streams. Customers can now establish a cost-effective “central nervous system” for data events that will unlock more uses cases while mitigating the growing costs of Kafka storage, the company said. Those use cases include such things as creating a log of events for compliance audits, training machine learning models based on event data, and improving the accuracy of recommendation engines.

In a blog post, Confluent software engineer Jun Rao provided a few more examples, saying that a retail bank could use infinite data retention to let customers search the full history of their transactions online, instead of just the last six months as is usually the case.

“This can be done by first integrating all transactional events in Kafka and then feeding the events from Kafka into a search engine like Elasticsearch,” Rao said. “As new transactional events are added in Kafka, they will be incrementally reflected in the Elasticsearch index in real time.”

Constellation Research Inc. analyst Holger Mueller, author of a report called “Infinite Platforms Power Enterprise Acceleration,” told SiliconANGLE that we are now living in an era of infinite computing, in which computing resources have essentially become infinite.

He said that providing infinite storage for something like Kafka is at first glance an odd combination because a streaming service, by its very nature, shouldn’t really need it.

“But the truth is that many of the other relevant infinite computing layers, like infinite insights and infinite artificial intelligence, require complete storage of both streaming and more residential data to power next-generation applications,” Mueller said. “It’s good to see Confluent making it easier for enterprises to persist their streaming data as long as they need.”

Mueller said the next step for Confluent is to extend the feature beyond AWS to other major cloud providers, something the company has promised will happen later this year. “When that happens, Confluence Cloud by itself will become an infinite compute platform for enterprises, as it will bridge compute capabilities across clouds,” he said.

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Confluent brings infinite data retention to Apache Kafka

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Confluent brings infinite data retention to Apache Kafka

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies