“As a tech guy, I love the Cloudera [Impala] announcement,” said Hadapt Co-founder and Chief Scientist Daniel Abadi in the Cube from the Strata + Hadoop World 2012 conference floor. “It’s great to hear that people are finally embracing this idea of bringing all the data into Hadoop” rather than maintaining two parallel clusters, one running a traditional RDBMS data warehouse for structured data and the other Hadoop for everything else.
This is Hadapt‘s founding vision, Abadi told SiliconAngle Founder John Furrier and Wilibon Chief Analyst David Vellante, and before that it was the vision behind the research group he founded at Yale University four years ago. A single cluster architecture has several advantages in terms of simplicity, lower cost, simpler data management, and the ability to combine structured and unstructured data analysis. The problem was that Hadoop and MapReduce as originally created could not deliver the very fast, interactive query results against small amounts of transactional data that is the bread-and-butter of the RDBMS.
Structured data developers have had 30 years to optimize that technology and develop tools and methodologies to deliver very high performance analysis on the limited set of structured data. But even when Abadi was working on the columnar-based restructuring and analysis system for data warehousing that became Vertica for his Ph.D., in the early 2000s, “it was clear that text had value. So it was a known problem in the database community that we had to integrate text.” So when HP bought Vertica, Abadi moved to Yale to research this problem and the new technologies coming out of Yahoo and Google.
Basically what Hadapt has focused on is bringing “the ideas from the database community into Hadoop”. These ideas include SQL and other technologies and approaches to provide real time analysis and response to what originally was a pure batch processing technology. Today, with version 2.0 of its product, which Hadapt announced at the start of the conference, it has achieved sub-second response times to simple queries. It also supports SQL queries to Hadoop databases.
“This is something people need,” Abadi says. One proof of that – Hadapt was voted the hottest startup at the start of the conference by attendees.
Until now, he said, Cloudera has championed the two-cluster approach with a connector linking them, and has formed several partnerships with RDBMS providers – ironically starting with Vertica – to create those linkages.
One of the big themes of this conference, Furrier said, has been simplicity. And, says Abadi, simplicity starts with the one-cluster architecture. Hadapt has a lead in that, he says, but now “we have to continue working on performance improvements and keep going from there.”