

Less than three weeks after open-sourcing its Quark cost-based SQL optimizer, big data-as-a-service provider Qubole Inc. is at it again.
Coincident with Kafka Summit taking place in San Francisco this week, Qubole said it’s releasing its StreamX ingestion service under an Apache open source license. StreamX is used to efficiently and reliably capture large scale, real-time data using Apache Kafka, the message broker that is surging in popularity thanks to growing interest in real-time and streaming analytics.
StreamX ingests data logs from Kafka and persists them to cloud object stores such as Amazon Web Services LLC’s S3. It guarantees that data is delivered without duplicates, addressing a characteristic of Kafka that can cause problems for users in some situations.
Kafka uses “at least once” delivery semantics to protect against data loss, meaning that it can sometimes deliver the same message several times to ensure that it gets through. This creates problems for some applications. StreamX is built on the Kafka Connect framework and is designed for reliable, “exactly once” delivery.
The service also performs on-the-fly conversions, a feature that is useful in high-speed analytics with tools such as Apache Spark. “A lot of what Kafka captures are JSON logs,” said Ashish Thusoo, co-founder and CEO of Qubole. “It’s often better to convert to the Parquet or ORC format for use with Hadoop” because those columnar formats are more compatible with data analytics applications, which typically drill down through columns in very large tables.
Qubole is also adding support for StreamX as a managed service on the Qubole Data Service (QDS), a self-service platform for big data analytics that runs on the Amazon Web Services, Google Compute Engine and Microsoft Azure clouds.
Support our open free content by sharing and engaging with our content and community.
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.