UPDATED 13:19 EDT / OCTOBER 24 2012

Impala Expands Real-Time Query in Hadoop, Empowers SQL BI, Visualization Tools

Impala changes the equation for Hadoop, giving users answers to their queries in seconds rather than hours, says Cloudera CEO Mike Olson in the Cube at the Strata/HadoopWorld 2012 conference.

His chief scientist, Jeff Hammer, agrees, saying that Impala is the first tool to change how he uses Hadoop. “I use it every day.”

Impala, which Cloudera announced at the start of the conference, is a distributed real-time query engine that works with HBase and HDFS. It accepts SQL language queries, empowering traditional SQL query and visualization engines to run directly on Hadoop databases rather than requiring them to work through connections from traditional data warehouses.

“We’ve known for a long time that batch data processing solves only part of the Big Data problem,” Olson said. “Not every user and workload can tolerate the latency.”

Hadoop can handle any kind of data – structured as well as unstructured. Impala does not replace Hadoop batch processing complex analysis tools. Rather it allows business users as well as data scientists to run simpler, interactive queries and get answers “at the speed of thought.” And it allows business executives to use the SQL query tools they already know rather than having to learn MapReduce. Thus it is an augmentation rather than a replacement for the Hadoop batch query tools, whose strength is in handling more complex queries.

Nor does it necessarily mean that Hadoop should or will replace the data warehouses in every organization. Olson, who self-identifies as an “old guard relational developer from the RDBMS industry” going back to its beginnings in the 1980s, argues that RDBMS database produces are excellent for what they do. “If you are doing banking transactions or OLAP, you will continue to run on your RDBMS data warehouse.”

Nor does it invalidate Oracle’s strategy, enunciated by Oracle CEO Larry Ellison at OracleWorld 2012 recently, of scrubbing the data in Hadoop and then “blasting it into the Big Iron” of the Oracle DW, Olson argues. Oracle is a CloudEra partner, and Olson argues that the right solution depends on the needs of the particular user.

But Impala allows users to do things with Hadoop, using the expanded data types it supports, that they could not do before. While it is not as fast as a high-end RDBMS system, it is a much less expensive solution, which makes applications that do not need the very high performance big-demand RDBMS systems more practical. Nor does it replace HBase. Rather, he says, it provides a real time solution for a specific set of users with different needs from those who use HBase or who program very complex queries with MapReduce. Nor will it be the last such query engine. “In the next two years Hadoop will get more real-time workloads that will attack different programming paradigms,” he predicted. He sees several interesting development projects going on in academia, and he promised to add those that catch on in the user community to CloudEra’s platform.

Specifically, says Hammerbacker, “HBase is good if you can specify a row or column. Solr goes past that to allow analysis of free text across many columns or within a field. Impala is solving the problem aggregating data across multiple tables.” Then a new generation of Open Source offerings are appearing aimed at processing data streams before they even hit the storage system, which is “another interesting class of real-time analysis.”

The next step for improving Impala’s performance, he said, is developing sophisticated joining algorithms. Flash does not provide that much of an advantage, only about a 2X to 3X improvement, which, given the differential in cost between flash and disk, makes that an impractical solution.

However, the problems that most interest him are developing better tools for cleaning non-relational data in Hadoop and then developing technologies to support analysis models such as regression and decision tree. “That comes down to the optimization algorithms. I want to parallelize that across the cluster so you don’t have to leave the BI tool you already know to work with Hadoop.”

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Impala Expands Real-Time Query in Hadoop, Empowers SQL BI, Visualization Tools

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Impala Expands Real-Time Query in Hadoop, Empowers SQL BI, Visualization Tools

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies