

Stream computing is a key platform for a growing range of data-rich, low-latency applications.
More online apps — such as mobility, the “internet of things,” media, gaming and serverless — require a robust, low-latency data processing backbone. Core features of many streaming apps now include real-time event processing, continuous computation, stateful semantics, publish-and-subscribe messaging, changed data capture and ACID transaction features.
Over the coming decade, data-at-rest architectures — such as data warehouses, data lakes and transactional data stores — will become less central to enterprise data strategies. In Wikibon’s big data analytics market update a year ago, we uncovered several trends that point toward a new era in which stream computing is the foundation of most data architectures:
Over the past several years, the stream computing market has seen a glut of open-source projects come into use. Many of these are now under the Apache Software Foundation. In addition to the many mature commercial stream computing and complex event processing solutions on the market, enterprises can choose from such alternatives Apache Kafka, Flink, Spark Streaming, Apex, Heron, Samza, Storm, Pulsar and Beam.
Though the functional overlaps among these stream-computing projects are considerable, Wikibon has been seeing a growing number of enterprise implementations that use two or more of them, leveraging the advantages of each. Next to Kafka, Apache Flink is the most popular stream computing open-source project.
Already in the 10th year since its invention and fifth year since it became an Apache project, Flink’s strong suit is its architectural versatility. Apache Flink can ingest millions of data points per second and do so while keeping track of relevant contextual information. Its most prominent users include Netflix Inc., Uber Technologies Inc., Lyft Inc. and Alibaba Group Holding Ltd.
Though it lacks the publish-and-subscribe features at the heart of Kafka, Flink provides a robust framework and scalable distributed engine for the vast majority of stream computing use cases. In fact, it’s not uncommon to see both Kafka and Flink deployed in complementary fashion in many enterprise stream-computing applications.
As it currently stands, the core features of the Apache Flink open-source codebase, currently available in latest stable release 1.7.2, are that it:
This week at the third annual Flink Forward developer conference in San Francisco, attendees learned how the Apache Flink project and the community that uses it is likely to fare now that its principal developer — data Artisans GmbH, recently renamed Ververica — has been acquired by China-based cloud powerhouse Alibaba.
In the conference keynote, executives from Ververica and Alibaba laid out the company’s priorities for the coming decade. What was most noteworthy was how accurate Wikibon’s forecasts for the streaming market — especially its convergence with batch processing and machine learning — truly are.
Apache Flink is on a roll and is becoming indispensable to a growing range of streaming use cases. Adoption, open-source code commitment, and other metrics presented at Flink Forward 2019 show that it’s becoming a key pillar in enterprise data strategies.
Robert Metzger, engineering lead at Ververica, showed stats pointing to Flink’s growing adoption on a global scale, especially in China. So it was no surprise, given Ververica’s new corporate parentage, when Metzger discussed how Ververica is launching a new Chinese-language user-support mailing list for the Apache Flink community. He also discussed the company’s efforts to integrate the substantial Flink user base in China into the open-source project’s Apache community.
To support these and other community members, Metzger discussed Ververica’s investments in improving the Flink website. Key enhancements underway include improving the ability to manage issue and bug tracking, publish community packages, and handle workflows for pull-request review and labeling.
Ververica plans to continue to evolve Apache Flink from stream processor into a unified data processing system. To the end it is focusing on developing Flink’s batch processing, machine learning and streaming analytics, and data warehouse/ETL integration features.
In batch processing, Xiaowei Jiang, senior staff platform engineer at Alibaba, discussed its work with the Ververica team to build out the “Blink” batch-processing capabilities in the open-source platform. To this end, planned additions to the Flink codebases will include a new Table API and an enhanced SQL query processor. According to Ververica CTO Stephan Ewen, it is working with Alibaba on improving the performance and fault tolerance of batch jobs running across distributed nodes.
In machine learning, Ververica CEO Kostas Tzoumas discussed the company’s investments in deepening Apache Flink’s algorithm libraries, utilities, and user interface for serving teams of data scientists who are building artificial intelligence and stream analytics applications for real-time continuous computation. They are also adding support for development of Flink machine learning apps in Zeppelin notebooks.
In data warehouse and ETL integration, Flink, according to Tzoumas, is being integrated more tightly with Hive’s metastore and data catalog. It’s also seeing performance enhancements in its embedded SQL query processing engine.
In addition, various breakouts during the day focused on ongoing Apache Flink enhancements that will tighten its integration with TensorFlow, Apache Beam and Apache Pulsar.
Taken together, these architectural improvements will enable open-source Apache Flink to support more enterprise use cases that have historically gone to at-rest data platforms such as Apache Hadoop.
Last year, data Artisans launched a commercial version of Flink aimed at enterprises. The platform includes features for automating the setup and maintenance of large-scale deployments. It also provides support for ACID, an approach that makes it possible to guarantee the reliability of important information such as financial records.
To sustain the commercial momentum of the Flink ecosystem, Ververica has retained and rebranded all of data Artisan’s products. Formerly known as dA Platform, the newly renamed Ververica Platform, which is delivered as licensed software, includes three core components:
According to Tzoumas, Ververica is expanding its Flink training and consulting programs. They are also recruiting new platform and services partners to drive the company’s solutions into more opportunities around the world.
If it hopes to expand adoption for Flink in the enterprise, Ververica will need to take the following strategic steps:
For further news from Flink Forward 2019, check out the Ververica blog. And to see how far the company has come in the past year, check out is what Tzoumas had to say on theCUBE at Flink Forward 2018.
THANK YOU