UPDATED 18:30 EDT / JUNE 19 2015

NEWS

One scientist’s perspective on Spark | #SparkInsight

by Amber Johnson

Fernando Perez is a scientist at Lawrence Berkeley National Laboratory and a founding investigator of Berkeley Institute for Data Science. In addition, Perez is a particle physicist who worked with the Python Project that led to the Jupyter Project, which is part of the Spark ecosystem.

“The Jupyter environment is precisely about building an environment where you can build code and narrative together,” Perez told Jeff Frick and George Gilbert of theCUBE at IBM Spark Summit 2015. The Spark system uses the Jupyter technology to run code, data and narrative live.

“Now in the last few years, the folks at the amp lab have built PySpark, which is the Python layer on top, [which] allows you to call Spark with a Python API … and then once you have run all your large-scale analytics in Spark, then you can import all of these Python libraries that these physical scientists have been writing for the last 10, 15 years … and use those … with the interactive facilities we have been building,” Perez said.

Contributions led to the current Spark program innovations

When asked how Perez started working with Python, he replied, “I realized that was I probably spending more time switching between coding languages rather than doing any work.” Then, while Perez was in graduate school, he learned about Python.

“We were all able to interact very quickly” with data, and according to Perezm in the early 2000sm multiple laboratories and institutions began contributing in Python. This trend of contributions led to the current innovation of the Spark program.

Around 2002, Perez “realized there was a value to seeing these things as open source projects, and many of us realized that we should actually work and try to get these things funded.”

Today, many government organization are funding such academic projects. Perez commented that DARPA partially funded Spark. “I think what Spark has brought to the game is an additional layer of enterprise-level analytics,” he said. “It’s not so much for everyday numerical computing workloads that many people in the physical sciences were using … Spark made a real killing in that space.”

Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of IBM Spark 2015.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.