UPDATED 18:30 EDT / JUNE 19 2015

NEWS

One scientist’s perspective on Spark | #SparkInsight

Fernando Perez is a scientist at Lawrence Berkeley National Laboratory and a founding investigator of Berkeley Institute for Data Science. In addition, Perez is a particle physicist who worked with the Python Project that led to the Jupyter Project, which is part of the Spark ecosystem.

“The Jupyter environment is precisely about building an environment where you can build code and narrative together,” Perez told Jeff Frick and George Gilbert of theCUBE at IBM Spark Summit 2015. The Spark system uses the Jupyter technology to run code, data and narrative live.

“Now in the last few years, the folks at the amp lab have built PySpark, which is the Python layer on top, [which] allows you to call Spark with a Python API … and then once you have run all your large-scale analytics in Spark, then you can import all of these Python libraries that these physical scientists have been writing for the last 10, 15 years … and use those … with the interactive facilities we have been building,” Perez said.

Contributions led to the current Spark program innovations

When asked how Perez started working with Python, he replied, “I realized that was I probably spending more time switching between coding languages rather than doing any work.” Then, while Perez was in graduate school, he learned about Python.

“We were all able to interact very quickly” with data, and according to Perezm in the early 2000sm multiple laboratories and institutions began contributing in Python. This trend of contributions led to the current innovation of the Spark program.

Around 2002, Perez “realized there was a value to seeing these things as open source projects, and many of us realized that we should actually work and try to get these things funded.”

Today, many government organization are funding such academic projects. Perez commented that DARPA partially funded Spark. “I think what Spark has brought to the game is an additional layer of enterprise-level analytics,” he said. “It’s not so much for everyday numerical computing workloads that many people in the physical sciences were using … Spark made a real killing in that space.”

Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of IBM Spark 2015.

 


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU