UPDATED 08:00 EST / JULY 23 2025

BIG DATA

StarTree to support Apache Iceberg in a bid to expand lakehouse use cases

StarTree Inc., which sells a real-time analytics platform and cloud service based on the Apache Pinot open-source online analytical processing database, today becomes the latest data analytics provider to announce full support for Apache Iceberg.

The StarTree Cloud managed service will employ Iceberg as the analytic and serving layer on top of its data lakehouse, effective today. The company said the move creates new use cases for Iceberg in real-time applications requiring high concurrency across thousands of simultaneous users. In particular, it enables Iceberg to be more easily applied to customer-facing scenarios where organizations want to expose data externally without relying on complex, multi-step pipelines.

Iceberg is a management layer that sits atop data files in cloud storage to improve consistency, manageability and query performance. It has been rapidly gaining acceptance as a de facto table standard, replacing an assortment of proprietary alternatives.

Iceberg provides transactional access to structured files in formats such as Parquet, a columnar storage file format optimized for efficient read/write access to large analytical datasets. However, Iceberg lacks native capabilities to process low-latency, high-concurrency queries.

For this reason, organizations have typically extracted Iceberg data into separate systems, such as key-value stores or proprietary formats, to achieve subsecond responsiveness. These require engineering-intensive pipelines and data duplication while limiting flexibility.

Query complexity

“Not only are you duplicating data, you’re amplifying the data itself because you have to materialize all combinations of your dimensions and metrics to make it easy to query in a key-value store-like fashion,” said Chinmay Soman, StarTree’s head of product.

StarTree said it enables direct querying of Iceberg tables without the need to move or transform the underlying data. The integration supports open formats and leverages performance-enhancing features, including Pinot indexing and materialization, local caching and intelligent prefetching.

“Data products today increasingly rely on historical data from lakehouses, but the serving layer has been missing,” said Chief Marketing Officer Chad Meley. “By querying Iceberg directly with subsecond latency, we’re eliminating the need for intermediate pipelines, duplicate storage and external databases.”

Executives said Iceberg support expands StarTree’s addressable market beyond its original focus on streaming and low-latency analytics. “This is certainly a new use case for us,” Meley said.  “The primary challenge we’re solving is no longer just about data freshness. It’s about helping customers build scalable data products without all the bloat and complexity.”

StarTree enables various indexes and pre-aggregated materializations to be defined directly on Iceberg tables. Indexes for numerical data, text, JavaScript Object Notation, geospatial data and other types can be distributed locally on compute nodes or stored in object storage.

Soman said the integration is based on work StarTree had already done to query Parquet files and S3-based object storage. “Parquet is not designed for random read access, but we’ve adapted Pinot to use it as a forward index,” he said. “Combining that with our understanding of Iceberg manifests and metadata gave us the building blocks we needed.”

Data stays in place

The company emphasized that its query engine still uses proprietary indexing strategies to achieve performance, but that the data itself remains in open formats. “We’re not moving data from Iceberg into StarTree’s proprietary format,” Meley said. “The only thing proprietary in this case would be the index.”

Support for Iceberg enables customers like financial technology firms to use StarTree to power merchant-facing dashboards that report historical cash flow or cohort revenue metrics. Transportation and logistics organizations are building interactive dashboards to review delivery performance, error rates and route efficiency across time. In both cases, data doesn’t need to be real-time, but must still be served with strict service level agreements to large user bases.

Paul Nashawaty, principal analyst at theCUBE Research, SiliconANGLE’s sister market research firm, said the approach addresses a growing gap in modern data architecture. “Iceberg adoption is accelerating, but most query engines can’t meet the performance SLAs of customer-facing applications,” he said. “StarTree’s ability to serve Iceberg data at high concurrency without duplication is a timely advancement.”

Soman said there are minor performance tradeoffs using Iceberg instead of Pinot’s proprietary native format, but that Pinot is  still capable of handling hundreds of queries per second with subsecond latencies.

Meley said that the decision to support Iceberg reflects both market momentum and practical customer needs. “All of our customers are asking about Iceberg,” he said. “It’s becoming the standard for lakehouse storage, and this allows us to support that natively while simplifying the architecture for serving data products.”

Photo: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.