Syncsort eases chore of integrating data from many sources

creek streams merging integration

Syncsort Inc. is planting its feet deeper into the red-hot data integration market.

The Pearl River, New York-based company on Thursday announced enhancements to its DMX-h extract/transform/load, or ETL, software that include integrated workflow and support for the new 2.0 release of the Apache Spark analytics framework.

Syncsort said the improvements build upon what it calls its “design once, deploy anywhere” architecture and ease the process of integrating data from multiple sources. In particular, the company touted its ability to streamline the process of building “data pipelines,” which weave together workloads on multiple compute frameworks.

For example, a pipeline may incorporate data from a mainframe-based warehouse with batch results from a Hadoop MapReduce session and streaming analytics and machine learning workflows from Spark. The integrated workflow feature is said to enable organizations to manage various workloads such as batch ETL on very large repositories of historical data, referencing business rules to manage data ingestion within the workflow.

Developers can write jobs in one environment, such as a laptop, and run them on another, such as MapReduce, Spark or in the cloud, Syncsort said. They can also specify where workloads run in order to match them to the most appropriate execution environment. Workflows can be created and combined withing a single interface, even if they’re running on different compute frameworks.

DMX-h developers can also take advantage of the enhancements made in Spark 2.0, which include better SQL support, a unified application program interface and significant performance improvements. Developers can visually design data transformations once and run the jobs in MapReduce, Spark 1.x or Spark 2.0 by simply specifying a different compute framework, the company said. No reconfiguration or recompilation is required.

Pricing was not disclosed.

Image via Pixabay CC