UPDATED 13:00 EDT / NOVEMBER 27 2018

BIG DATA

LinkedIn launches new data features for Samza stream processing engine

LinkedIn today announced a major new release of the Samza stream processing engine, one of its marquee open-source projects, that will give enterprises more options in how they analyze their real-time data.

Samza is built to process high-volume data streams quickly with a high degree of reliability. This feature combination makes it handy for a number of important enterprise use cases. Among them are infrastructure monitoring, fraud detection and analyzing data from connected devices such as sensors embedded in factory equipment.

Building an analytics application that can harness Samza to ingest such information is a complex undertaking. That’s why a significant part of the software’s installed base is made up of tech firms such as Netflix Inc., Uber Technologies Inc. and VMware Inc. To ease developers’ work, Samza 1.0 introduces two new ways of plugging workloads into the engine besides the native application programming interface.

The first is a tool called Samza SQL. As the name implies, it enables applications to interact with data processed by Samza using the industry-standard Structured Query Language. LinkedIn said that the tool is more accessible than the native API and removes the need for developers to sort out low-level such as provisioning hardware resources manually.

The other new interface alternative takes the form of an integration with the open-source Beam project. Beam provides a unified API for popular analytics engines such as Spark and Flink that spares software teams the trouble of familiarizing themselves with each individual platform. According to LinkedIn, the integration will make Samza-powered applications more portable while enabling developers to use a wider selection of programming languages.

The Microsoft Corp.-owned company also took the opportunity to revamp the native API itself. Samza 1.0 adds built-in commands for performing tasks such as filtering data that previously required developers to build custom workflows from scratch.

“Developers had to implement complex operations such as windows and joins by themselves on top of this API,” LinkedIn engineer Jagadish Venkatraman wrote in a blog post. “This made building applications time consuming and error-prone. To address this in Samza 1.0, we built a high-level API with built-in operators like map, filter, join, window, etc. This allows you to express complex data pipelines easily by combining multiple operators.”

By simplifying development, these improvements could make the engine accessible for a broader range of enterprises. That’s equally true for the new “standalone mode” rolling out in conjunction. 

Until now, Samza had to be deployed with YARN, an open-source system for managing hardware resources and application workflows. The software is fairly popular, but it’s just one of several tools that enterprises use for the task. The standalone mode gives companies the flexibility to build Samza directly into an analytics service and then use their YARN alternative of choice to manage that service.

“As Samza gained momentum, our users desired the flexibility to run stream processing in any environment —Kubernetes, Mesos, or on the cloud,” Venkatraman wrote. “This mode allows Samza to be embedded as a lightweight library within an application and run on any resource manager of your choice. You can increase parallelism by simply spinning up more instances of your application.”

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.