UPDATED 10:39 EDT / MARCH 30 2016

NEWS

Apache Spark will dominate the Big Data landscape by 2022, Wikibon says

The Apache Spark Big Data processing framework will account for more than a third of all Big Data spending by 2022, according to new research by Wikibon.

Wikibon Big Data analyst George Gilbert’s latest report, Forecasting Spark’s Adoption in the Context of Systems of Intelligence, is the first-ever forecast on the Spark industry and how its reshaping the Big Data market.

The report outlines how Apach Spark is set to play a critical role in the adoption and evolution of Big Data technologies over the next decade, because the technology supports increasingly sophisticated ways for enterprises to leverage Big Data compared to traditional Hadoop technology, Gilbert says.

The role of Apache Spark is set to expand significantly in the next six years, to the point that by 2022 it will account for 37 percent of all Big Data spending, which Wikibon forecasts will be close to $70 billion a year by that stage. According to Gilbert, this massive expansion will be driven by evolution of Big Data applications towards “continuous, real-time processing of vast streams of data”, of which Spark will be a “crucial catalyst”.

Apache Spark

The evolution of Apache Spark

In his report, Gilbert outlines three stages of evolution for Apache Spark, the first of which is taking place right now. At present, most Spark users have adopted the technology in order to address the limitations of Apache Hadoop, the current number one Big Data technology.

Spark’s growth rate is accelerating rapidly and will hit a rate of 72 percent by 2019 due to the vital role in plays in data lakes. Gilbert explains that Spark’s popularity is due to its ability to overcome the performance and complexity challenges that come with using Hadoop’s batch processing engines, because it’s able to “ingest streaming data and chain together different types of analysis and iterate over the data in-memory”. As a result, users can perform much richer analysis on their data.

Spark does need to overcome some challenges, such as the threat posed by dedicated streaming engines like Impala, which may be more suitable in certain circumstances. Gilbert also says Spark isn’t ready for many Internet of Things (IoT) use cases because its footprint is “too big and unsuited to operate at the network edge”. Even so, Gilbert reckons that the efforts of Spark’s large development community will be able to address these challenges in the coming years.

The second stage of Spark’s evolution, from 2020 to 2022, will see it open up new application opportunities as it becomes the “design time foundation” for machine learning in predictive models, Gilbert writes. Later, Spark will enable these pipelines to run at speeds fast enough that it’s possible to connect customer interactions with transactional applications that make recommendations or decisions. In addition, Spark’s simplified framework will make it a much more appealing analytical tool than certain specialized products, Gilbert believes, further increasing its adoption.

Finally, the third stage of Spark’s evolution, from 2022 onwards will see it serve as a catalyst for what Gilbert terms “Online Learning Applications”, essentially applications that can make decisions on behalf of individuals. By 2026, Gilbert estimates that 59 percent of all Big Data spending will be tied to Apache Spark or related streaming analytics technologies in some way, with these technologies eventually automating many of the tasks that are currently fulfilled by data engineers and data scientists.

To learn more about the evolution of Apache Spark, check out Gilbert’s full report over at Wikibon.

*Disclosure: Wikibon is owned by the same parent company as SiliconANGLE*

Image credit: PublicDomainPictures via pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU