Spark rival Apache Apex hits top-level status
Apache Apex was yesterday moved up to become the Apache Software Foundation (ASF)’s latest top-level project. Apache Apex is an open-source stream and batch processing platform that’s compatible with HDFS and YARN, runs in-memory, and offers enhanced event processing and fault tolerance capabilities.
Readers may be familiar with the name “Apex”, because the project started out as DataTorrent Inc’s proprietary real-time streaming core before being open-sourced. DataTorrent contributed Apex to the ASF last fall, where it was incubated for the last six months. Nowadays, Apex has been expanded considerably, and is composed of the platform itself and Apex Malhar, a library of operations that implement common business logic to help with decision making.
The Malhar library supports several popular file transfer protocols, databases and messaging queues. These include FTP, NFS, and JMS, Kafka, RabbitMQ and numerous NoSQL databases.
“Apache Apex meets the demands of today’s Big Data applications with real-time reporting, monitoring, and learning with millisecond data point precision,” the ASF said in a statement. “Its pipeline processing architecture can be used for real-time and batch processing in a unified architecture. Apex is highly performant, linearly scalable, fault tolerant, stateful, secure, distributed, easily operable with low latency, no data loss, and exactly-once semantics.”
Some might be forgiven for thinking that Apache Apex sounds a bit too similar to more well-known stream processing engines like Apache Spark, and the more specialized Apache Storm and Apache Samza, and it is. But Apex likes to think it sets itself apart from those rivals in terms of usability.
Apex’s better-known rivals are notoriously difficult to implement, but the new kid on the block simplifies many of the necessary operational tasks that make it difficult to do stream processing at massive scale. Apex is able to rapidly redistribute work from nodes that malfunction, while automatically recognizing new ones. The big benefit of this is that enterprises can keep a lid on their data as it grows, and potentially handle up to billions of events per second, without the latency issues that Spark Streaming suffers as a result of its batch-oriented nature.
Thomas Weise, Apache Apex PMC member, said, “It is very exciting to see Apex after nearly four years since inception becoming an ASF top-level project. It opens the strong capabilities and potential of the platform to a wider audience, and we’re looking forward to a growing community to continue driving innovation in the stream-processing space.”
The Apex Project is available on GitHub now, and currently has 29 contributors.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU