UPDATED 15:33 EDT / OCTOBER 30 2017

BIG DATA

Application container-friendly Pentaho 8 gets native Kafka support

The Hitachi Vantara subsidiary of Hitachi Ltd. has added support for Apache Kafka streaming data in version 8 of its Pentaho data integration and analytics software.

The move, announced on Oct. 26, extends the company’s embrace of the open-source ecosystem building around Apache Spark and its Spark Streaming extension, which are commonly used with Kafka.

Pentaho 8.0 fully enables stream data ingestion and processing using either its native streaming engine or Kafka. The stream processing capability builds upon existing Pentaho Spark integration with SQL, MLlib and the “adaptive execution layer” the vendor introduced in the spring. Apache Kafka is a lightweight, fast and highly scalable message broker that passes data between applications and is commonly used in Hadoop big data environments. A recent enhancement added “exactly once” delivery capabilities.

International Data Corp. estimates that the volume of data organizations produce will increase tenfold by 2025, that one quarter that data will be real-time and the “internet of things” will comprise 95 percent of that streaming volume, said Arik Pelkey, senior director of Pentaho product marketing at Hitachi Vantara. The company has revamped the architecture of its platform to accommodate other streaming engines and plans to include Apache Flink in the near future, he said.

The adaptive execution layer automatically maps data integration logic to the execution environment, reducing or eliminating the need for Spark programming. Users can match workloads to the most appropriate processing engine without the need to rewrite data integration logic. Adaptive execution has been made easier to set up, use and secure in the new release. As a result, Pelkey said, “you don’t have to be a developer anymore to work with Spark Streaming data.”

To better address the growing popularity of containers, Hitachi Vantara is also adding support for “worker nodes,” which are slimmed-down versions of its software optimized for speed and portability. “You can use worker nodes in the cloud or on-premises within a container to, for example, process multiple small jobs such as data transformation or reporting,” said Anand Rao, a ‎senior product marketing manager. “These virtual nodes form part of a cluster so you don’t need the metadata repository to be replicated multiple times.”

Worker nodes support the in-line visualization features that the company also introduced this spring in an effort to make data integration simpler. The feature enables users to visualize data during the integration process in order to more easily spot outliers.

The new release also adds support for the Apache Knox Gateway to existing support for security protocols from Cloudera Inc. and Hortonworks Inc. The Knox gateway is used for authenticating users to Hadoop services. Also new is native support for the Apache Avro data serialization system and Apache Parquet columnar storage format. Native support makes it easier for users to read and write to those big data file formats and process with Spark using Pentaho’s visual editing tools. Availability is planned for next month.

Image: Flickr CC

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Application container-friendly Pentaho 8 gets native Kafka support

Image: Flickr CC

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Application container-friendly Pentaho 8 gets native Kafka support

Image: Flickr CC

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies