UPDATED 08:00 EST / JULY 14 2022

BIG DATA

StarRocks’ real-time analytics engine moves to the cloud

The red-hot market for real-time analytics in the cloud just got another entrant with StarRocks Inc.’s announcement today of a cloud-native version of its SQL online analytics processing database engine.

StarRocks Cloud is a fully managed software-as-a-service version of the platform the company developed two years ago and released under an open-source license. It’s based on the Apache Doris massively parallel processing-based interactive SQL data warehouse.

The architecture is purpose-built for real-time data analysis by a large number of concurrent users with support for fast multitable joins. The engine works with a variety of schema models, including flat tables, star and snowflake schemas. It provides a basis for combining real-time transactional data with historical records.

The company has mostly flown under the radar since its founding in early 2020 but has raised more than $60 million in venture capital and signed on 110 paying customers, including large accounts such as Airbnb Inc. and Lenovo Group Ltd.

Growth market

The global streaming analytics market is expected to grow nearly 29% annually through 2025, driven by the rapid deployment of internet of things devices and the growing appetite among business leaders for up-to-the-minute data, according to Grand View Research Inc.

StarRocks supports high concurrency and availability with an engine that can handle more than 10,000 queries per second and ingest data at speeds of up to 100 megabytes per second per node, the company said.

Real-time processing has caught on quickly, but real-time analytics has been slower to gain traction, said Li Kang, the company’s vice president of strategy. One of the problems is the need for denormalized tables in analytical queries, which are redundant tables that are created to reduce the need for complex and time-consuming joins.

That approach is “OK for reports, but if users want to leverage it for real-time decisions it’s too slow,” said Kang said. Denormalization yields good query performance but increases complexity, he said. For example, denormalizing a table that has multiple foreign keys pointing to it creates multiple copies of the data. That breaks an essential tenet of normalization, which is that each data element should be unique.

Denormalization penalties

“You pay the price of delay in ingestion, extra hardware and development costs,” Kang said. “You also have limited concurrency. There are lots of issues from both performance and business requirement standpoints.”

StarRocks uses vectorized execution, which takes advantage of multicore CPUs to change the data orientation from rows to columns, across CPU, memory and storage. Columnar storage is more efficient for analytics queries while row storage is better for transaction processing.

Kang said StarRocks’ principal competitors are products built on real-time data stores such as Apache Druid, Apache Pinot and Apache ClickHouse. All require data to be in denormalized form, he said. “This is why it’s been notoriously difficult to build a real-time infrastructure with those technologies,” he said.

The company also competes with distributed query engines based on the Apache Presto and Apache Trino projects. The company said it can process queries three to five times faster than products from its competitors.

“We take the concept into the query engine so we can work on the columnar data without converting it for each CPU, memory and storage layer,” Kang said. “The result is we get much better query performance for a single-table query or a multitable query will in star schema format and we use better parallel processing to support thousands of users at one time.”

StarRocks can ingest data from cloud data lakes such as Amazon Web Services Inc.’s S3 and Azure Blob storage. It also supports streaming data managed by Apache Kafka and change-data capture streams from relational databases, which identify and track changes to data in a database.

StarRocks Cloud will be available initially on the AWS and Azure clouds with support for Google LLC’s cloud planned in the near future. It supports standard SQL and MySQL protocols and any business intelligence tools that use SQL.

Image: StarRocks

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU