

A 2015 survey from Databricks Inc. found that Spark is quickly replacing MapReduce in Hadoop deployments, but adoption could be increasing even faster if not for the difficulty of upgrading. Syncsort Inc. is moving to ease the task for users of its DMX-h tool today by adding native integration with the engine that removes the need to manually rewrite analytics workflows as part of the transition.
Instead, a customer can now simply select Spark as their data processing engine of choice through the software’s graphical interface and have their implementation automatically adapted for the framework’s programming model. The conversion is then applied to every workflow created in DMX-h, including the connectors an organization uses to import records from external sources and the transformations that are executed on the information once it’s inside Hadoop. Syncsort credits the simplicity of the process to the fact that its tool doesn’t require users to do any coding in the first place.
DMX-h makes it possible to define how data should flow into an analytics cluster using high-level policies that are decoupled from the underlying engine. As a result, Syncsort says that those policies can be easily applied to different data crunching frameworks by the “intelligent execution engine” tasked with handling the implementation. The firm claims that technology is not only able to port a Hadoop workflow from MapReduce to Spark but also run it in standalone implementations of the latter, even if they’re in the cloud. Last year’s Databricks survey revealed that such deployments are becoming increasingly common as organizations look to reduce operating costs.
And at the same time, Spark adopters are increasingly making use of complementary open-source tools like Kafka, which is now natively supported in DMX-h too. That means customers can take advantage of the message broker’s performance to speed the flow of data between different parts of their analytics environments. According to Syncsort, the integration should come particularly useful for organizations dealing with sensory transmissions and other real-time information that needs to be processed while it’s still fresh.
THANK YOU