UPDATED 07:30 EDT / JUNE 09 2015

NEWS

Teradata adopts Presto for Hadoop SQL queries

Big Data analytics firm Teradata Corp. is throwing its considerable weight behind the open-source Presto project, which provides an SQL query engine for interactive queries.

This could be a very big deal. Presto was, like so many open-source projects, born as an internal project inside Facebook in 2012. The engine is still heavily used by the social media giant, running tens of thousands of queries a day on data stores that scale up to 300 petabytes. Aqs well as providing interactivity with said data, Presto also has the ability to query different types of data no matter where it resides, whether it’s in a relational, NoSQL or proprietary format, stored on Cassandra, Hive or elsewhere.

Teradata wants to take Presto to the next level. The company, which offers a wide range of Big Data products and services, has announced a “multi-year” commitment to become one of the main contributors to the project. To date, most of its main contributions have come from Facebook, which open-sourced the software back in November 2013.

Teradata’s first move is to improve upon the features that can simplify Presto’s adoption, and it’s already made significant advances there. Installation, support documentation and basic monitoring tools can already be download directly from Teradata or from the Presto GitHub page.

But the bigger deal comes later this year, as Teradata plans to integrate Presto with other key parts of the Big Data ecosystem, such as standard Hadoop distribution management tools, interoperability with YARN, and connectors that extend Presto’s capabilities beyond the Hadoop distributed file system (HDFS).

“A lot of the weaknesses of Presto are strengths for Teradata,” said Justin Borgman, VP/GM of Teradata’s Development Center for Hadoop and Co-Founder of Hadapt, which Teradata acquired last July. “We feel we can expand our brand into the Hadoop space as good members of the community.”

More than just Hadoop

In an interview, Borgman was keen to emphasize Presto’s ability to query disparate sources of data stored in “data lakes”, which are becoming increasingly popular among enterprises. Data lakes are centralized data repositories stored on Hadoop to facilitate Big Data analytics, and can store data in the HDFS to feed YARN-based analytical tools like HBase, Hive, Spark and Storm. But Presto can do a lot more besides just querying Hadoop data.

“It’s not just for querying Hadoop,” Borgman said. “It’s also other data sources, such as querying Cassandra, Kafka, MYSQL. With Presto, it’s easy to build connectors for other databases. This aligns with our view that the enterprise is going to have many platforms and you need to get to all of that data.”

Even more advantageous is that Presto is distribution-agnostic, which means it’s not tied to any single Hadoop distribution. “It’s vendor neutral. We support it but we didn’t create it,” Borgman said. “It’s used by many big companies such as Airbnb, Dropbox, Groupon. If you build an application on Presto, then you can move to other distributions.”

Much of the expertise that Teradata is contributing towards Presto comes in the form of ex-Hadapt engineers. Borgman co-founded and led Hadapt until it was acquired by Teradata last July, and said Teradata now has 16 engineers dedicated to working on Presto’s development, all former Hadapt employees.

“This is where Hadapt is being applied,” Borgman said. “It was acquired last summer. It’s a SQL and Hadoop company and we’re now fully behind Presto.”

Those developers are now aiming at the lofty goals of enabling Open Database Connectivity (ODBC) and Java Database Connectivity (JBDC), both of which are deemed essential to improve integration with business intelligence tools and spur greater enterprise adoption of the technology. Security will also be improved by limiting access based on job roles. All of these capabilities should be introduced sometime next year, Borgman said.

As of today, developers can download Presto 101t direct from Teradata. It’s a pre-tested, stable release of Presto bundled into a pre-built RPM, a release package, or else it’s available in a self-contained sandbox virtual machine (VM). The sandboxes are available with both Cloudera Inc.’s and Hortonworks Inc.’s Hadoop distributions.

Photo Credit: KANDJY via Compfight cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU