UPDATED 09:01 EST / SEPTEMBER 26 2016

NEWS

Pentaho doubles down on Hadoop and Spark

Large-scale analytics projects tend to incorporate numerous different components that must work seamlessly with one another for the combined whole to operate smoothly. As a result, data management vendors place a big emphasis on making their software interoperable with third party tools.

Pentaho is doubling down today by adding enhanced support for several key technologies from the Hadoop ecosystem to its popular data preparation system.

The first on the list is Spark, which Pentaho Data Integration (PDI) users can now employ to execute SQL requests. The addition lets business analysts harness their existing structured query skills to interact with the data crunching engine instead of having to learn its complicated native execution model. And in the same spirit, today’s update also aims to ease management operations by extending the orchestration capabilities of PDI to more of the analytics framework’s components.

Organizations can now use the platform to control Spark’s SQL, machine learning and stream processing modules in addition to their own custom applications. Pentaho senior product marketing manager Ben Hopkins told SiliconANGLE that the functionality lends itself to a wide range of different use cases.

A bank, for instance, could use PDI to run a fraud detection algorithm on Spark, feed customer data from its Hadoop cluster to the model for processing and then push the results to a third system where they can be examined by analysts. The goal is to lower the learning curve for users, which is also the main motivation behind the new other new integrations that are rolling out today.

PDI now supports Apache Kafka, which promises to ease the task of shuffling data among the different components of an analytics environment, and can integrate with the Apache Sentry access control management tool. Hopkins says that the latter integration enables administrators to apply the existing security rules in their Hadoop environments to PDI and thus avoid the chore of separately implementing the polices from scratch. Plus, today’s update also expands the platform’s integration with the Kerberos authentication system to provide better protection for large clusters.

And last, Pentaho has also made some improvements to PDI’s core data processing capabilities. The platform now enables analysts to execute transformations against their information while an analytics workflow is running of having to hard-code the operations beforehand. According to the company, the functionality can make the process up to ten times more efficient than before.

Image via Geralt

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU