Streaming data infrastructure: Scaling AI with cloud-native innovation
In today’s cloud-native era, efficient streaming data infrastructure is pivotal for scaling artificial intelligence training and operational insights.
As organizations increasingly rely on streaming data for artificial intelligence training, analytics and operational insights, the challenges of scaling technologies such as Apache Kafka are coming into sharp focus. These include escalating costs, operational complexities and the inefficiencies of legacy architectures in cloud-native environments, according to Akshay Shah (pictured), chief technology officer of Buf Technologies Inc.
“Here in the cloud-native community, we expect Kubernetes to be our abstraction for compute and then, for the most part, we expect object storage to be our abstraction for disks,” Shah said. “If there’s going to be cross-region replication, or if I want really fast transfer and I want RDMA instead of kernel networking, I want that handled in MinIO or Google Cloud Storage or S3. I don’t want to be in the poor Apache Kafka code base in Java trying to eke out small performance wins in the replication layer.”
Shah spoke with theCUBE Research’s Rob Strechay and guest analyst Sanjeev Mohan at KubeCon + CloudNativeCon NA, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed modernizing streaming data infrastructure by addressing inefficiencies in traditional systems such as Apache Kafka, emphasizing cost efficiency, scalability and schema-driven development. (* Disclosure below.)
Modernizing streaming data infrastructure for scalability and efficiency
One of the unique aspects of Buf’s approach is its emphasis on schema-driven development. Schemas such as Protocol Buffers, or Protobufs, are essential for ensuring data integrity and compatibility in streaming systems, according to Shah.
“When Kafka first came out, Kafka was a pure message pass. But with Protobuf, the schema becomes a first-class citizen,” Shah said. “To me, schema is really important because I come from the data side. So, once you have the schema, then you start doing amazing stuff on top of it, like data protection, rollback access control, data quality.”
As organizations continue to scale their data needs, innovations such as Bufstream signal a turning point for streaming architectures. By reimagining streaming data infrastructure, Bufstream is helping businesses navigate the complexities of scaling while embracing innovation.
“I hope I’ll be able to look you right in the eye and say that Bufstream is the best way to get your terabyte-per-second workload wired up, from Databricks to Snowflake, to BigLake, to the data warehouse in your cloud provider of choice,” Shah said. “We’re getting close, but we’re not quite there yet.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of KubeCon + CloudNativeCon NA:
(* Disclosure: TheCUBE is a paid media partner for the KubeCon + CloudNativeCon NA. Neither Red Hat Inc., the headline sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU