UPDATED 09:00 EST / JUNE 20 2023

BIG DATA

Starburst broadens support for other data sources in update of managed Trino service

Starburst Data Inc., which sells a distributed query engine based on the open-source Trino project, today debuted a set of new features in its Starburst Galaxy managed cloud database that are intended to make it easier for organizations to manage data from multiple sources.

The new release includes a public preview of Gravity, a centralized access and governance layer for all connected data sources such as Amazon Web Services Inc.’s S3 and Microsoft Corp.’s Data Lake. Galaxy includes a metastore, automated data cataloging, search features, attribute-based access control and the ability to create and share data products, which are curated packages of data that others can access. A metastore is a service that stores the metadata in a relational database.

Cross-cloud querying is a new function that lets users explore data across clouds, regions and data sources before moving it to reduce unnecessary data transfer fees and improve data quality. “It sits essentially on top of all clouds,” said Matt Fuller, co-founder and vice president of product at Starburst. “From there you can connect to BigQuery data, Google Cloud Storage, Azure Data Lake Storage, [Microsoft] Synapse or [AWS] Redshift.”

Direct storage connections

Users can also now connect pre-existing storage to Galaxy and treat it as a first-class entity. That allows data scientists and engineers to choose the format and tools that are right for their workload without requiring a central format.

“Let’s say you’re using Amazon’s Athena to query S3 and you hit Athena’s limitations,” Fuller said. “You can point us directly at that S3 data and start querying. You can get value immediately and if you keep data in the data lake in an open format, that’s the easiest way to avoid being locked in.”

The platform enhancements closely follow Starburst’s recent announcements of connectors to the dbt Cloud data transformation platform and the Tabular data catalog that enable users of those tools to build open data lake architectures and data pipelines spanning multiple data sources. “We want to continue to integrate into the ecosystem of tools people already use and love,” Fuller said. “We don’t want to reinvent the wheel.”

Warp Speed, an acceleration technology that Starburst picked up with its acquisition of Verada Ltd. last year, uses patented indexing algorithms that the company said increases query speeds by an average of 40%, even on very large clusters.

The data product concept is a component of data mesh, an architecture that invests ownership of data in the people who create it and encourage them to curate and share it as they would any product. “Data product is something we’ve had in our enterprise installable version for some time but we didn’t yet have in Galaxy,” Fuller said. “They allow you to create highly curated assets that are searchable and discoverable by users.”

The announcements come during Launch Week, a four-day Starburst conference taking place this week.

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU