How IBM handles complexities in Apache Spark for enterprise adoption | #SparkSummit


IBM Corp. opened its Spark Technology Center a little over a year ago in downtown San Francisco to collaborate with the open-source community while sharing the company’s real-world experiences using Spark. With a mission to focus on the open-source world of Apache Spark, STC pushes the community to make forward progress and contribute to the core project and the ecosystem.

“The overarching goal is to help drive adoption, particularly enterprise customers — the type of customers IBM serves — to harden Spark and make it enterprise ready,” said Nick Pentreath (pictured), principal engineer at STC. 

IBM touts the project as “The Best Thinking on Apache Spark.”

During Spark Summit East 2017 in Boston, Pentreath spoke to Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, to provide insight into the successful open-source project, as well as how machine learning is improving real-time decision making. (*Disclosure below.)

Sparking IBM’s mission

IBM has committed a great deal of resources to open-source projects, such as Linux and Java. Now the company is investing in Spark. Pentreath pointed out that IBM makes investments in open-source technologies that it views to be transformational and game changing. He feels there was a missed opportunity with Hadoop so now “IBM views Spark as its successor and is backing the analytics platform and operating systems for analytics and big data for the enterprise.”

Pentreath noted that machine learning is a vital part of the mission to improve real-time decision making. He spoke about the evolution of big data storage and how the industry is at the point of figuring out how to use it.

“There is rich value in data, and to unlock it you really need intelligent systems, you need machine learning, you need AI,  you need real-time decision making. And that starts transcending the boundaries of old rule-based systems and human-based systems. We see machine learning as one of the key tools, or unlockers of value in data stores,” Pentreath maintained.

What are the shortfalls of Spark? According to Pentreath, there is a complex workflow that involves an end-to-end story. He discussed the processes of using machine learning; however, it doesn’t end there. He illustrated that you need to close the feedback loop to add value. “I think this [closing the loop] is the piece of puzzle that is missing in the end-to-end story and delivering [analytics] at scale with security in an enterprise-grade format,” he said.

Pentreath feels it is imperative to close this gap to improve the customer experience by using more data, better data, and faster data. He said the Spark model is getting better. However, there is a long way to go for it to impact the customer.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston. (*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo by SiliconANGLE