Dremio expands scope and boosts speed of its Apache Arrow-based analytics engine
Dremio Corp. is adding a data catalog to its self-service data analytics platform in a major release announced today.
The company is also incorporating new controls for multitenant deployments, boosting end-to-end data encryption, offering options to run inside software containers and adopting Gandiva, an open-source performance-enhancing library for the Apache Arrow distributed query engine upon which the company’s namesake product is based.
Apache Arrow uses columnar in-memory analytics to boost query speeds up to 100fold over conventional analytics engines. The technology is similar to that which Google LLC uses to deliver subsecond response times to search queries, but Dremio is optimized for analytical operations.
The data catalog in Dremio 3.0 isn’t a bid by the company to compete with the many existing enterprise data catalogs but rather is focused on capturing and organizing data for use in Dremio. Data catalogs are used to create an inventory and descriptions of data assets within an organization. Dremio has added a crowdsourcing element in the form of a shared wiki page accompanying each data set that can be used for metatagging and description.
Security gets a boost in this version with the addition of end-to-end Transport Layer Security, a successor protocol to Secure Sockets Layer. While Dremio had encryption features in earlier releases, they did not span the full data access spectrum. The platform now also supports Amazon Web Services Inc. EC2 instance profiles for secure access to AWS S3 storage. Native integration with Apache Ranger is also new in this release.
The new multitenant features enable data engineering teams to manage and optimize cluster resources across a variety of workloads and users, the company said. Workload management policies written in SQL can be applied for such tasks as resource allocation, query admission and timeouts.
“Most data analytics platforms treat all users the same, which means you have to provision different clusters for different users,” said Chief Marketing Officer Kelly Stirman. Dremio has added features that deliver “fine-grained controls over which users or resources get priority,” he said. For example, administrators can specify that an intern should never have priority access to the cluster outside of work hours.
Also new in this release is compatibility with the Kubernetes orchestration framework via Docker images and templates. Kubernetes can be used to deploy and manage large collections of software containers, which are mini virtual machines that include all the services needed to run an application. Dremio has added charts that are compatible with the open-source Helm Kubernetes package manager for provisioning and scaling. “Helm is what the cool kids are doing these days,” Stirman said.
Gandiva, which was built by Dremio developers, combines the LLVM runtime compiler with an execution kernel for efficient evaluation of arbitrary SQL expressions on Arrow. It’s claimed to provide up to 100fold improvements in speed on certain types of queries. “In general, the more complex the query the better a candidate it is for Gandiva,” Stirman said, “but every query will be improved.”
Dremio 3.0 is available immediately in both free community and paid enterprise editions.
Photo: Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU