UPDATED 11:47 EST / JULY 11 2016

NEWS

Survey finds surge in real-time and streaming analytics adoption

If a new survey of 4,000 professionals who work with big data is any indication, real-time and streaming analytics are about to take off in a big way.  OpsClarity Inc., a provider of monitoring software for fast data and streaming applications, found that 92 percent of companies surveyed plan to leverage stream-processing applications this year, while 79 will reduce or eliminate investments in batch-only processing.

The company’s 2016 State of Fast Data & Streaming Applications report found that while definitions differ about what constitutes real-time data – 35 percent defined it as information processed within 30 minutes – there is no question that the promise of technologies like Apache Kafka and Apache Spark to speed decision-making is capturing the imagination of big data practitioners.

Customer-facing applications are driving the trend by a small margin of 32 percent to 29 percent over applications that optimize internal business processes, but nearly 40 percent of respondents said both goals are important.

And users are well along the adoption curve, with 65 percent reporting that they are already in production with real-time data pipelines and another 24 percent planning to deploy before the end of the year. Nearly 70 percent plan to reduce batch processing while shifting investments to streaming analytics.

Not surprisingly, the technologies of choice are some of the most talked-about open source options: Apache Kafka is being adopted by 86 percent of respondents, with Apache Flume and Rabbit MQ far behind at just over 20 percent.

The most popular processing frameworks are Apache Spark (adopted by 70 percent) and Apache Storm at 27 percent. Half of respondents use MapReduce, which isn’t surprising given its tight integration with Hadoop, the big data on-ramp of choice.

And speaking of Hadoop, it’s still very much in user plans as a data store, with 54 percent planning to use the HDFS file system it in the coming year. Cassandra makes a strong showing in the number two position, with 42 percent of respondents planning to use it, followed by Elasticsearch and relational databases at 38 percent each. Most respondents leverage two or three different data stores.

The survey also documents the growing shift to open source software; 91 percent of respondents said they leverage at least some open source code in their applications, and 47 percent use it exclusively. A scant 9 percent said they use only commercial software for rapid data processing.

Apache Yarn was the most popular resource manager cited (68 percent), probably because of its association with Hadoop. At 36 percent adoption, Apache Mesos was a strong second and appears to be the resource manager of choice for streaming applications.

Complexity and lack of experience were cited as the two greatest barriers to building streaming data pipelines, continuing a trend that has frustrated Hadoop adoption in many companies. The most-cited culprit in the complexity problem was frequency of code changes and pushes, cited by 55 percent of respondents. Rapid updates to open source tools are one of the reasons vendors such as Red Hat Inc. and, MapR Technologies Inc. have recently reined in some release schedules to enable customers to better digest the tools they already have.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU