UPDATED 08:00 EDT / NOVEMBER 20 2024

Kirk Dunn, CEO of Kurrent, talks to theCUBE about data platforms during an AnalystANGLE.

BIG DATA

StarTree brings batch-like flexibility and performance to streaming data

StarTree Inc., the developer of a managed service based on the Apache Pinot real-time data analytics platform, today rolled out a set of enhancements aimed at helping organizations more efficiently accommodate evolving data structures, enhance query performance and streamline user access management.

The company said the rapid expansion of table sizes and numbers and soaring ingestion and query rates are making managing dynamic data structures more complex. Unlike batch systems, which benefit from predictable periodic data loads and tolerance for brief downtime, real-time analytics requires that performance, security and reliability be maintained amidst constantly changing conditions that include schema shifts or data gaps.

Pinot users are coping with dramatic increases in scale, said Chinmay Soman, StarTree’s head of product. “Real-time tables in Pinot used to be hundreds of thousands of messages per second but now we’re seeing tens of millions of messages per second,” he said. “The amount of data being backfilled has increased to tens of terabytes per day and the number of users that are onboarding to the platform has also increased. The gap in skill sets is way more apparent now than before.”

Backfilling refers to processing and populating historical data into a system or data pipeline that typically operates on real-time data to ensure that datasets are complete.

Real-time processing complicates tasks such as data loading, transformation, backfilling and schema changes. “All the data management problems we have already faced in batch, we are now solving for real-time systems,” Soman said. A pause of a few minutes in batch ingestion is usually tolerable but not in scenarios such as financial services or advertising auctions that need up-to-the-second currency.

No-pause ingestion

StarTree Cloud now features “pauseless” ingestion. It maintains a continuous data flow during segment building and upload phases. Pauses often happen because the system must wait to ensure data is committed reliably. Pauseless ingestion relies on segments, which are dynamic groupings of data that are updated continuously based on incoming information.

“We made it asynchronous, so as soon as you decide a segment is done, you immediately begin on the next segment,” Soman said. The feature ensures that data is correct, although recovering from a crash is somewhat more involved than in a batch processing scenario.

Performance management improvements powered by machine learning simplify query optimization by helping users navigate the myriad indexing options available in Pinot. Performance Manager analyzes query structures and metrics to recommend enhancements, such as indexes, bloom filters, derived columns and star-tree indexes. Users can apply optimizations with one click to improve performance while also maximizing cluster throughput and reducing manual effort.

Optimization isn’t new in Pinot but StarTree is making the capability available to everyone in the new release. “Not everybody is a SQL guru,” said Peter Corless, head of product marketing. “This uses a machine learning algorithm that watches for what makes for a good query so you don’t have to ask that guy on the third floor for the ins and outs of constructing it.”

Indexes are persistent, which takes a toll on storage. StarTree Cloud will now inform users of the costs of indexing and allow them to choose whether or not to use one.

Schema evolution

StarTree Cloud now allows the system to accommodate new fields, indexes, altered data types and other structural modifications without disrupting operations, ensuring that applications that rely on the database continue to function smoothly despite changes in input data.

“This is geared toward making developers’ lives easier,” Soman said. “You can evolve the schema in the background, essentially fixing the existing table without downtime and with minimum impact on live performance queries.” Schema evolution is done on a separate set of autoscaling nodes with updated schemas uploaded to the live server to minimize disruptions.

A new data backfill feature addresses incorrect or missing data by enabling users to reload data from past events to fill gaps. Teams can then go back and retrieve the incorrect or missing information without disrupting operations. StarTree said the feature is particularly valuable in maintaining data integrity for real-time analytics.

Role-based access control allows administrators to assign and control user views and actions based on roles, even within a sub-second window. RBAC is a more efficient approach to managing security than granting permissions individually.

StarTree is addressing a hot market. International Data Corp. has forecast that the stream processing market will grow at a compound annual growth rate of 21.5% through 2028, driven by increased data velocity, real-time analytics and the internet of things.

All capabilities are in private preview during the fourth quarter of 2024, with general availability planned for the first quarter of 2025.

Image: SiliconANGLE/Bing Image Creator

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

StarTree brings batch-like flexibility and performance to streaming data

No-pause ingestion

Schema evolution

Image: SiliconANGLE/Bing Image Creator

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Oracle Data Deep Dive NYC 2026

HPE World Quantum Day 2026

Qlik Connect 2026

Nutanix .NEXT 2026

KubeCon + CloudNativeCon EU 2026

StarTree brings batch-like flexibility and performance to streaming data

No-pause ingestion

Schema evolution

Image: SiliconANGLE/Bing Image Creator

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Oracle Data Deep Dive NYC 2026

HPE World Quantum Day 2026

Qlik Connect 2026

Nutanix .NEXT 2026

KubeCon + CloudNativeCon EU 2026

Cookies