UPDATED 21:01 EDT / JULY 01 2020

BIG DATA

Confluent brings infinite data retention to Apache Kafka

Apache Kafka startup Confluent Inc. today debuted a new infinite data retention feature within its Confluent Cloud platform that enables customers to store as much related information as they want.

Apache Kafka is a popular open-source technology that’s widely used by companies to store event data, which is essentially information that’s generated by applications as they’re being used. Kafka provides a way to store these streams of data and direct them to downstream repositories such as data lakes and data warehouses, where they can be analyzed.

Confluent, which was founded by the creators of Kafka, said the infinite data retention feature is part of its ongoing “Project Metamorphosis,” which is aimed at taking Kafka’s event streaming capabilities and transforming them into a service that can support business operations at scale between servers in on-premises data centers and the cloud. The feature will be rolled out to Confluent Cloud users on Amazon Web Services next month and on other public cloud platforms later this year.

Most Kafka users have created enormous clusters that store petabytes of event data that can be used for either real-time or historical analysis. But Confluent says the tight integration of storage and compute within Kafka makes it extremely expensive for companies to store all of this data.

It’s for that reason that Confluent is separating compute and storage with its infinite data retention feature, enabling users to scale up storage independently.

“We’ve removed the limitations of storing and retaining events in Apache Kafka with infinite retention in Confluent Cloud,” said Jay Kreps, co-founder and chief executive officer at Confluent. “With event streaming as a business’s central nervous system, applications can pull from an unlimited source of past and present data to quickly become smarter, faster, and more precise.”

Confluent has effectively removed a cap on how much data can be stored and for how long, within its Confluent Cloud, which is a hosted service available on public cloud platforms such as AWS.

The new capability helps to eliminate the technical and economic strain faced by companies that need to deal with rapidly growing volumes of data from their real-time event streams. Customers can now establish a cost-effective “central nervous system” for data events that will unlock more uses cases while mitigating the growing costs of Kafka storage, the company said. Those use cases include such things as creating a log of events for compliance audits, training machine learning models based on event data, and improving the accuracy of recommendation engines.

In a blog post, Confluent software engineer Jun Rao provided a few more examples, saying that a retail bank could use infinite data retention to let customers search the full history of their transactions online, instead of just the last six months as is usually the case.

“This can be done by first integrating all transactional events in Kafka and then feeding the events from Kafka into a search engine like Elasticsearch,” Rao said. “As new transactional events are added in Kafka, they will be incrementally reflected in the Elasticsearch index in real time.”

Constellation Research Inc. analyst Holger Mueller, author of a report called “Infinite Platforms Power Enterprise Acceleration,” told SiliconANGLE that we are now living in an era of infinite computing, in which computing resources have essentially become infinite.

He said that providing infinite storage for something like Kafka is at first glance an odd combination because a streaming service, by its very nature, shouldn’t really need it.

“But the truth is that many of the other relevant infinite computing layers, like infinite insights and infinite artificial intelligence, require complete storage of both streaming and more residential data to power next-generation applications,” Mueller said. “It’s good to see Confluent making it easier for enterprises to persist their streaming data as long as they need.”

Mueller said the next step for Confluent is to extend the feature beyond AWS to other major cloud providers, something the company has promised will happen later this year. “When that happens, Confluence Cloud by itself will become an infinite compute platform for enterprises, as it will bridge compute capabilities across clouds,” he said.

Image: geralt/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU