Pentaho Moves Big Data Integration Project to Open Source Apache


Pentaho is open sourcing its Big Data integration engine, known as Pentaho Kettle, moving the entire project to the Apache License Version 2.0. Kettle was previously available under the GNU Lesser General Public License.

Kettle includes Pentaho’s extract, transform and load (ETL) engine used to move structured and unstructured data between Big Data sources such as Hadoop and HBase without coding. Kettle can perform ETL jobs both inside and outside Hadoop clusters, and includes a graphical user interface for developers called Spoon to set up Hadoop MapReduce jobs, run Pig scripts, and perform Hive queries.

Apache is, of course, home to a slew of Big Data projects – including the aforementioned Hadoop and Hbase, as well as a number of NoSQL-related projects – and, by adding Kettle to the mix, Pentaho is hoping to spur wider adoption by Big Data developers.

The move “will foster success and productivity for developers, analysts and data scientists giving them one tool for data integration and access to discovery and visualization,” said Matt Casters, Founder and Chief Architect of Pentaho’s Kettle Project.

Pentaho already has partnerships with virtually all of the vendors attempting to commercialize Hadoop, including Cloudera, Hortonworks, MapR and EMC. Pentaho set up a Kettle resource page here.