

The latest data integration provider to jump on the Apache Spark bandwagon is Talend Inc., which is rolling out a new version of its namesake platform this morning that leverages the speedy in-memory execution framework to accelerate data ingestion. It’s claiming that customers can achieve an immediate fivefold performance improvement by switching over.
The company promises to make the migration literally as simple as the push of a button with a new refactoring option that can automatically convert data pipelines written for MapReduce, the previous gold standard of open-source analytics, to work with Spark. That theoretically requires no changes to the high-level workflows that a user has defined for their cluster.
New projects also benefit from the upgrade, which brings some 100 pre-implemented data ingestion and integration functions that make it possible to pull data into Spark without having to do any programming. According to Talend, the result is an up to tenfold improvement in developer productivity.
That’s an attractive proposition for organizations that have traditionally ingested their data using hard-coded pipelines that required manually implementing every change, an expensive and often time-consuming process. But of course, Talend is not the only vendor offering an alternative to the old way of aggregating information for Spark users, which is why the new release also packs a number of value-added features to set it apart.
One of the biggest additions is masking, which allows an organization to replace a sensitive file with a structurally similar placeholder that doesn’t reveal any specific details. That’s useful in scenarios where, say, an analyst at a hospital that doesn’t have permission to view patient treatment history wants to check how many medical records there are in a given dataset coming into Spark.
THANK YOU