UPDATED 08:00 EDT / SEPTEMBER 17 2019

BIG DATA

Dremio unlocks cloud object stores for use in high-performance analytics

Dremio Corp., a developer of self-service analytics technology, says it has figured out how to unlock data in popular cloud stores for use in performance-sensitive data warehouse applications.

The company’s new Data Lake Engines for Amazon Web Services, Azure and hybrid cloud systems enables users to work directly on data in cloud object stores such as AWS’ S3 and Microsoft Corp.’s Azure Data Lake Storage. These services have been enormously popular for their low cost and scalability, but the inherent latency of the cloud and performance limitations of object storage have limited their use in-performance scenarios.

“In the past, your only option was to extract the data and put it in a warehouse like [AWS] Redshift or build cubes and business intelligence extractions,” said Dremio Chief Executive Tomer Shiran. That process can sometimes take days or weeks.

The company thinks it has licked that problem with a technology it calls Columnar Cloud Cache, a read-ahead cache that automatically loads data into nonvolatile memory express or NVMe or solid-state storage that’s located physically close to the processor. The approach improves performance up to 70 times while reducing network traffic and requiring no administration or setup, the company said. “You get the best of both worlds: the scalability of S3 with performance of NVMe,” Shiran said.

Another performance-enhancing feature is “predictive pipelining,’ which improves read-ahead hits on columnar data while increasing read throughput to the maximum allowed by the network. The software also makes use of the Gandiva Initiative for Apache Arrow, an execution kernel optimized for high-performance columnar processing of Apache Arrow data.

It’s a columnar thing

Dremio knows a thing or two about column processing. Its software is based upon Apache Arrow, a columnar in-memory processing accelerator that uses columnar in-memory analytics, a memory mapping technique that arranges data in columns rather than rows. Conventional relational engines process data in rows out of necessity, but columnar processing is at least 10 times faster and so better suited to analytics.

The Dremio technology, which Shiran said is been in development for more than a year, anticipates the data customers will need and loads it into memory or flash storage, thus eliminating the performance hit of retrieving it after the query has been submitted.

“We have a pretty good idea of what columns are necessary,” Shiran said. “We can forecast one second in advance so you’re never waiting on the data.”

Forecasting data needs isn’t as complex as it may sound. In most cases, users query a limited set of recent data repeatedly for a defined period of time. “They’re typically asking many questions over the same limited set of customer records,” Shiran said, citing the example of a cruise ship operator that continually queries a passenger list while at sea.

Dremio says its technology can connect with most or replace cloud data warehouses and is compatible with most business intelligence front ends, including those from Tableau Software Inc., Microsoft Corp. and Looker Data Sciences Inc. It also supports Open Database Connectivity, Java Database Connectivity, representational state transfer and the Apache Arrow Flight interfaces. The software also supports cross-platform joins, such as the query spanning S3 and Oracle Corp. relational engine.

The company is also launching Dremio Hub, which is essentially an app marketplace for connectors. There are currently only five connectors listed, but “We expect to see dozens a year from now,” Shiran said. Dremio won’t test or certify software in the marketplace but will leave that role to the community.

Pricing is based upon the size of the Dremio cluster, which runs entirely in the cloud. Annual pricing can run between “five and seven figures,” the CEO said.

Image: Freedom of conscience/photopin

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU