UPDATED 09:00 EDT / MARCH 14 2017

BIG DATA

Pentaho pitches its integration platform as a machine learning aid

Pentaho Corp. is broadening the scope of its orchestration capabilities to include machine learning, saying its toolset can help teams of data scientists, engineers and analysts to train, tune, test and deploy predictive models in a fraction of the time typically required.

Pentaho said its combined data integration and analytics platform enables predictive models to be deployed more quickly, regardless of use, industry or whether models are built in R, Python, Scala or Weka. The announcement amounts to a repositioning of the existing Pentaho 7.0 platform for a new audience. “We haven’t really been targeting that community in the past, but it makes sense for us to speak to data scientists,” said Arik Pelkey, senior director of product marketing.

Building predictive machine learning models is a chore because workflows must be defined for every data source and because most models don’t transition smoothly into production, said Wael Elrifai, director of enterprise solutions for Pentaho’s Europe/Middle East/Africa region. “If a train operator wants to predict where failures will occur and has 3,000 sensors generating 4 million data points per second, the data scientists need to write 3,000 workflows,” he said. “We can do all of these at a high level” using drag-and-drop metaphors.

Pentaho says it can bridge the gap between predictive models, which are typically captured in notebooks, and operational data flows. When building in Pentaho, “90 percent of your feature engineering ends up being part of production workflow,” Elrifai said. “Your feature problems are part of your operational model as well.”

The task of building predictive models is frustrated by silos, which inhibit cross-functional workflow, the company said. Ventana Research Inc. has said that 92 percent of organizations plan to increase their use of predictive analytics, but half have difficulty integrating predictive models into existing architectures.

Pentaho is attacking this problem by making it easier to preserve the work that goes into building models as they transition into operation. Data scientists and engineers can use the platform to blend traditional sources such as enterprise resource planning, enterprise asset management and unstructured data sources in an automated process that combines data on-boarding, data transformation and data validation.

With integrations for languages such as R and Python, and for machine learning packages including Spark MLlib and Weka, Pentaho said it enables data scientists to train, tune, build and test models faster.  Models developed by data scientists can then be embedded directly in a data workflow, thereby leveraging existing data and feature engineering efforts.

Data engineers and scientists can also re-train existing models with new data sets or make feature updates using custom execution steps. Prebuilt workflows can automatically update models and archive existing ones. Enhancements in version 7.0 enable visual debugging of data transformation processes, which can also be applied to machine learning models.

Photo: Clever Cogs! via photopin (license)

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU