Dremio adds machine learning to self-service data analytics platform
Dremio Corp. is folding machine learning into its self-service data analytics platform, boosting performance and integrating with Looker Data Science Inc.’s namesake business intelligence query engine.
Dremio claims to eliminate the need for time-consuming data transformation tasks needed to load information into data warehouses, multidimensional cubes and aggregation tables for business intelligence uses. Its Data Reflections technology, which is based upon the open-source Apache Arrow columnar in-memory query engine, physically optimizes representations of source data for rapid query processing. Dremio separates compute and storage capabilities and uses in-memory processing to optimize performance and cost.
Enhancements in this release can accelerate processing by a factor of up to 1,000, the company said. The query planner now automatically selects the best reflections to accelerate queries for ad hoc requests, business intelligence and data science workloads.
Dremio can also now automatically detect the star and snowflake schemas, which are logical arrangements of data in multidimensional tables commonly used in data warehousing scenarios. “If there is a notional sense of a star or snowflake schema, we can detect it and optimize queries so they run at interactive speed without requiring you to load data into a cube or warehouse,” said Chief Executive Tomer Shiran.
The new release includes a management engine that automatically optimizes the priority, ordering and queuing of reflection refreshes along with error recovery. With this release, users can also now access cloud object stores such as Amazon Web Services Inc.’s S3 and Microsoft Corp.’s Azure Data Lake Store. Enhancements to Apache Arrow in this release provide up to a 60 percent reduction in query latency.
Learning by doing
The new Dremio Learning Engine makes recommendations based upon patterns it observes in user queries over time. “For example, if I’m working with a particular data set, Dremio can automatically recommend to me another data set I didn’t know about that is suitable to be combined with the data I’m working with,” Shiran said.
Machine learning is also used to observe data during query execution to detect schema changes in source systems and adapt the data catalog automatically. This is particularly useful when querying data from unstructured sources like unstructured text and NoSQL databases, where schemas can vary from record to record. Dremio can also intelligently cache and index metadata into its catalog, taking into account user access patterns.
With this release, Dremio is courting users of the rapidly growing Looker platform (pictured), which simplifies data extraction processes by pulling in data directly from multiple sources for visualizations. “Looker is designed primarily to work with one relational database at a time,” Shiran said. “Now Dremio can make things like MongoDB and ElasticSearch appear like a relational database. You get all the benefits of Looker without the need for a relational source, and you can perform joins across multiple data sources.”
Dremio recently raised $25 million in venture capital, bringing its total funding to $40 million. It provides both open-source and enterprise versions of its analytics engine. Licensing is per-node on an annual subscription basis, but the company declined to provide specifics.
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.