Pentaho pitches its integration platform as a machine learning aid


Pentaho Corp. is broadening the scope of its orchestration capabilities to include machine learning, saying its toolset can help teams of data scientists, engineers and analysts to train, tune, test and deploy predictive models in a fraction of the time typically required.

Pentaho said its combined data integration and analytics platform enables predictive models to be deployed more quickly, regardless of use, industry or whether models are built in R, Python, Scala or Weka. The announcement amounts to a repositioning of the existing Pentaho 7.0 platform for a new audience. “We haven’t really been targeting that community in the past, but it makes sense for us to speak to data scientists,” said Arik Pelkey, senior director of product marketing.

Building predictive machine learning models is a chore because workflows must be defined for every data source and because most models don’t transition smoothly into production, said Wael Elrifai, director of enterprise solutions for Pentaho’s Europe/Middle East/Africa region. “If a train operator wants to predict where failures will occur and has 3,000 sensors generating 4 million data points per second, the data scientists need to write 3,000 workflows,” he said. “We can do all of these at a high level” using drag-and-drop metaphors.

Pentaho says it can bridge the gap between predictive models, which are typically captured in notebooks, and operational data flows. When building in Pentaho, “90 percent of your feature engineering ends up being part of production workflow,” Elrifai said. “Your feature problems are part of your operational model as well.”

The task of building predictive models is frustrated by silos, which inhibit cross-functional workflow, the company said. Ventana Research Inc. has said that 92 percent of organizations plan to increase their use of predictive analytics, but half have difficulty integrating predictive models into existing architectures.

Pentaho is attacking this problem by making it easier to preserve the work that goes into building models as they transition into operation. Data scientists and engineers can use the platform to blend traditional sources such as enterprise resource planning, enterprise asset management and unstructured data sources in an automated process that combines data on-boarding, data transformation and data validation.

With integrations for languages such as R and Python, and for machine learning packages including Spark MLlib and Weka, Pentaho said it enables data scientists to train, tune, build and test models faster.  Models developed by data scientists can then be embedded directly in a data workflow, thereby leveraging existing data and feature engineering efforts.

Data engineers and scientists can also re-train existing models with new data sets or make feature updates using custom execution steps. Prebuilt workflows can automatically update models and archive existing ones. Enhancements in version 7.0 enable visual debugging of data transformation processes, which can also be applied to machine learning models.

Photo: Clever Cogs! via photopin (license)