LinkedIn launches new data features for Samza stream processing engine
LinkedIn today announced a major new release of the Samza stream processing engine, one of its marquee open-source projects, that will give enterprises more options in how they analyze their real-time data.
Samza is built to process high-volume data streams quickly with a high degree of reliability. This feature combination makes it handy for a number of important enterprise use cases. Among them are infrastructure monitoring, fraud detection and analyzing data from connected devices such as sensors embedded in factory equipment.
Building an analytics application that can harness Samza to ingest such information is a complex undertaking. That’s why a significant part of the software’s installed base is made up of tech firms such as Netflix Inc., Uber Technologies Inc. and VMware Inc. To ease developers’ work, Samza 1.0 introduces two new ways of plugging workloads into the engine besides the native application programming interface.
The first is a tool called Samza SQL. As the name implies, it enables applications to interact with data processed by Samza using the industry-standard Structured Query Language. LinkedIn said that the tool is more accessible than the native API and removes the need for developers to sort out low-level such as provisioning hardware resources manually.
The other new interface alternative takes the form of an integration with the open-source Beam project. Beam provides a unified API for popular analytics engines such as Spark and Flink that spares software teams the trouble of familiarizing themselves with each individual platform. According to LinkedIn, the integration will make Samza-powered applications more portable while enabling developers to use a wider selection of programming languages.
The Microsoft Corp.-owned company also took the opportunity to revamp the native API itself. Samza 1.0 adds built-in commands for performing tasks such as filtering data that previously required developers to build custom workflows from scratch.
“Developers had to implement complex operations such as windows and joins by themselves on top of this API,” LinkedIn engineer Jagadish Venkatraman wrote in a blog post. “This made building applications time consuming and error-prone. To address this in Samza 1.0, we built a high-level API with built-in operators like map, filter, join, window, etc. This allows you to express complex data pipelines easily by combining multiple operators.”
By simplifying development, these improvements could make the engine accessible for a broader range of enterprises. That’s equally true for the new “standalone mode” rolling out in conjunction.
Until now, Samza had to be deployed with YARN, an open-source system for managing hardware resources and application workflows. The software is fairly popular, but it’s just one of several tools that enterprises use for the task. The standalone mode gives companies the flexibility to build Samza directly into an analytics service and then use their YARN alternative of choice to manage that service.
“As Samza gained momentum, our users desired the flexibility to run stream processing in any environment —Kubernetes, Mesos, or on the cloud,” Venkatraman wrote. “This mode allows Samza to be embedded as a lightweight library within an application and run on any resource manager of your choice. You can increase parallelism by simply spinning up more instances of your application.”
Photo: Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU