UPDATED 08:00 EDT / MARCH 15 2023

BIG DATA

CelerData expands lakehouse support in StarRocks-based analytics platform

CelerData Inc. maker of a real-time analytics platform based on the StarRocks open-source massively parallel database, today announced version 3 of its enterprise product with enhanced support for the hybrid data warehouse/data lake repositories known as data lakehouses.

CelerData, which renamed itself from StarRocks Inc. last year,  is the principal developer of StarRocks, a fork of Apache Doris that was recently donated to the Linux Foundation.

The company said most query engines are not well-tuned for real-time analytics. They struggle with ad-hoc queries and bog down under a large number of concurrent users. “They may accept streaming data sources but they don’t support real-time,” said Li Kang, Celerdata’s vice president of strategy. As a result, he said, “enterprises will often build two pipelines — one for batch processing in the data lake or data warehouse and a separate real-time pipeline.”

The new release is built on a cloud-native architecture to enable better workload and resource isolation so that different warehouses can be created for different use cases. It gives lakehouse users the option to run high-performance analytics without ingesting data into a central data warehouse. CelerData claims its query engine can support thousands of concurrent users at 10,000 queries per second and is three times faster than competitive query engineers.

Batch and streaming

Users can query both streaming and historical data in real-time without having to wait for streaming data to be batched for analysis. The company’s approach differs from the quasi-real-time processing technique called micro-batching by splitting data into different partitions called tablets. “Each time we get a new record we read it from our reader,” Kang said. “It’s not micro-batching but you can think of it that way and combine that data with other tables.”

This release also adds integration with common storage formats such as Apache Iceberg and Apachi Hudi. Previously the software was limited to local storage on a virtual machine or server and only supported one direct-attached storage type. “Data can now be stored in S3 or our local storage,” Kang said, referring to Amazon Web Services Inc.’s object-storage format.

Performance can be further improved using a local caching layer for remote input/output operations and multi-table materialized views that are built from multiple joint base tables.

CelerData Version 3 will be generally available in early April 2023. The company also operates a fully managed cloud service.

Image: Tung Nguyen/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.