UPDATED 09:00 EDT / MAY 22 2017

BIG DATA

Pentaho bids to bring Apache Spark to the masses

Data integration and analytics supplier Pentaho, a subsidiary of Hitachi Group Co., is throwing its arms around Apache Spark in the new release of its Pentaho Business Analytics product.

Pentaho said that with the 7.1 version, it’s the first data integration provider to offer “adaptive execution” on any engine for big data processing, with Spark the first platform supported. Apache Flink support is coming soon and other platforms are on the way.

This release also expands cloud integration with Microsoft’s Azure HDInsight cloud-based Hadoop offering, enterprise-level security for Hortonworks Inc. environments and improved in-line visualizations.

Pentaho executives positioned the announcement as a salve for the shortage of big data developers that they said is limiting the adoption of Spark.  “We see Spark where Hadoop was three to five years ago. In order to work with it you need to be a developer,” said to Arik Pelkey, senior director of product marketing at Pentaho.

With the latest revisions “we’re running our full suite of visual transformation against Spark,” Pelkey said. The company is doing this using something it calls an adaptive execution layer which automatically maps data integration logic to the execution environment.

In contrast, the company said, other data integrators require users to create Spark-specific data integration logic, which often requires Java programming skills. Pentaho executives said their approach will enable many other execution frameworks to be accommodated in the future. It will also reduce debugging and rework time by guaranteeing compatibility.

“We’re making big-data developers more productive because they now don’t have to regression-test their code to make it work,” Pelkey said. “We’re expanding the range of people who can work with Spark.”

Support for HDInsight basically mirrors the functionality that Pentaho already provides for Amazon Web Services Inc.’s cloud platform. “We’re supporting virtually all the same capabilities that we do with AWS, not just to connect to data but to run big data processing jobs in the cloud and work with a variety of the ecosystem components,” said Ben Hopkins, a senior product manager. The company’s engine enables integration projects to be split between cloud and on-premise data for efficiency and minimal latency. “You can process Salesforce data in the cloud and process SAP data on prem in the same job,” Pelkey said. “You can process that data where it lives. ”

The new security features for Hortonworks environments also duplicate existing functionality the company offers for Cloudera Inc. environments. That includes Kerberos impersonation, which protects against cluster intrusions by creating a one-to-one relationship between a user working in the cluster and one working with Pentaho. The company is also adding support for the Apache Ranger Hadoop security framework to Hortonworks.

Image: Flickr CC

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU