UPDATED 18:30 EDT / MARCH 24 2021

CLOUD

Dremio offers alternative to data copies and data warehouses through robust lake architecture

Copies of large datasets tend to proliferate in many organizations because they reside in a data lake that lacks the ability to perform the work necessary to turn that information into business insight. Dremio Corp. has built a unicorn-level business by breaking through this barrier, allowing enterprises to run live, interactive queries against petabyte-scale data in lake storage.

Dremio’s value proposition is that eliminating data copies is a good thing. Bringing compute to the data eliminates unnecessary work by operationalizing data lake storage and accelerating analytics processing.

“What’s wrong with copies?” asked Robert Maybin (pictured), principal architect at Dremio. “Maybe they land in cloud storage, but before they can be queried, somebody has to go in and reformat those datasets, transform them in ways that make them more useful and more performant. Copies are a natural thing to do, but they come at a real cost.”

Maybin spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during the AWS Startup Showcase Event: Innovators in Cloud Data. They discussed Dremio’s vision for next-generation data lake architecture, recent key technology changes that significantly boosted the software’s capabilities, and why hybrid-focused enterprises are embracing the firm’s data lake solution. (* Disclosure below.)

Shift in thinking

Dremio’s growing acceptance presents an interesting scenario for the data warehouse industry. In its release last fall, the company now allows users to query a cloud object store through a business intelligence tool such as Tableau or Looker, with the same performance characteristics as if the information resided in a data warehouse.

“The real approach, and this is available today with the rise of cloud technologies, is we can shift our thinking,” Maybin said. “How can we take some of these features and capabilities that one would expect in a data warehouse environment and bring that directly to the data? It requires new technology to do this. That’s what we call the next generation data lake architecture.”

That next generation is based on an ability to separate and scale compute from storage. By running production business intelligence directly on cloud data lake storage, the need to move it to a data warehouse is diminished.

“We didn’t have the flexibility to scale compute and storage independently or the kind of networking we have today,” Maybin explained. “What we’ve got with some of the new cloud technology is to basically do away with that requirement. Now we can have very large, provisioned pools of data that can grow and grow without the limitations of nodes of hardware.”

One of the key new architectural elements announced by Dremio last fall was to cache data in the Apache Arrow format, a language-agnostic software framework for developing data analytics applications.

Dremio had been using its Reflections tool, an internally managed persistence of data, to accelerate queries. But the tool had to be created in Apache Parquet files first, which slowed down the process. Now Dremio is using the Arrow format directly, significantly accelerating query response times by as much as 10x, according to company officials.

“We can accelerate certain query patterns by creating Reflections,” Maybin said. “That’s the edge piece that gives us BI acceleration without having to use additional tools. The ability to create Reflections is certainly a differentiator.”

Querying multiple data sources

Another key adjustment made by Dremio was to offer scale-out query planning. This gave the platform concurrency, an ability to match the number of query coordinators with executors, thus allowing the software to support thousands of users.

“We’re in the business of building technology that allows users to query large data sets in a scale-out performant way directly on the data where it lives,” Maybin said. “We can also query not just one source of data, but multiple sources of data and join those together in the context of the same query.”

Dremio also offers the ability to perform runtime filtering in a data lake. This reduces the need for large table scans that might include a lot of unnecessary information for the user.

“You can create schemas, you can create layers of views and accelerations and effectively allow users to build out virtually in the form of views what they would have done before with all of their various ETL pipelines,” Maybin said. “We’re not just the backend high-performance query engine. We aren’t just the acceleration layer. We have a very rich, fully featured UI environment that allows users to actually log in, find data, curate data, reflect data and build their own views.”

Enterprise customers have responded to Dremio’s premise. The firm lists a number of large companies as clients on its website, notably major financial institutions such as UBS Inc., Zions Bank and TransUnion LLC.

The firm’s acceptance is yet another sign that enterprises continue to focus on hybrid solutions while keeping a close eye on the cloud.

“All of these organizations either have a toe in the water or they’re halfway down the path of exploring how to take all of this on-premises data and processing and get into AWS,” Maybin said. “We provide a really good path to solve some of their on-prem problems today and then give them a clear path as they migrate to the cloud. We’re ideally positioned for that story.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase Event: Innovators in Cloud Data. (* Disclosure: Dremio Corp. sponsored this segment of theCUBE. Neither Dremio nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU