Spark experts weigh in on business impact, AI ecosystem | #SparkBizApps
At the recent Apache Spark Maker Community 2016 event held in San Francisco, a program addressing the IBM and Spark alliance was shared with an attentive audience, featuring a series of individual speakers leading up to a panel discussion with participants drawn from a variety of data science focuses.
Spark: Open-source and certification support
The first of the speakers to present was Ritika Gunnar, VP of Offering Management, Data and Analytics at IBM, who presented a segment titled “Shifting Data Science Into High Gear With the Analytics Operating System.” Looking at IBM’s investment in Spark, she highlighted how “[IBM has] committed to the community around building applications around Apache Spark,” with its event-exclusive announcement of a “more global Apache Spark Maker event,” where the winners of the Apache Spark Hackathon will get up to $100,000 as an investment in their applications.
“We believe this last year was really about Spark coming of age,” Gunnar said. She added that the community has seen immense progress over the past year, and with such phenomenal success in just one year, including the way it open-sourced key capabilities, “Apache Spark is one of the largest-growing open-source projects that there’s ever been.”
Other points in Gunnar’s segment covered plans to accelerate Spark’s momentum with focus on its use as an analytics operating system, the announcement of a Spark certification program, an analytics integrated development environment (IDE) platform for developing at higher speeds, and IBM’s “Learn, Create, Collaborate” strategy.
IBM’s Spark outreach
The second speaker, Armand Ruiz, lead product manager of the IBM Data Science Experience, covered some of IBM’s Spark tools and data-pooling, with a spotlight on the three main things listed on the landing page of its data center: “Collaborate with other data scientists, learn from the community and build your projects.”
Ruiz also listed the community cards split into four types: “Articles, Data Sets, Notebooks and Tutorials.” Between these nodes, he shared, IBM is enabling the ability to tap into tens of thousands of preexisting Jupiter Notebooks, allowing access to established information, operations, applications and more.
During Ruiz’s presentation setup, Tooraj Arvajeh, chief engineering officer for BlocPower, LLC, was introduced. BlocPower was described by Gunnar as “a startup based in New York City … that has the wonderful vision of being able to help inner-city buildings be a lot more energy-efficient,” with its use of Spark serving to improve optimization of visualization and implementation. While he did not give a presentation during this part of the event, he was present to answer questions in person and act as a signifier of the versatility of Spark’s applications.
Improving the truth of data
The next speaker was Joel Horwitz, director of Corporate & Business Development at IBM Analytics, who presented on the topic of “Open Analytics.” Key points made in his section explored the ideas of “expanding the ecosystem” to include key partnerships and making the most out of available data. “I think a lot of times we think about data science as a very static activity,” he said, “but where we’re headed with this is to make it a lot more dynamic and a lot more flowing.”
Horwitz handed things off to JJ Allaire, founder and CEO of RStudio, Inc., who discussed some of the history and philosophy of the R programming language, as well as R’s use in analyzing and managing statistical data and algorithms. Among his focuses were ways of improving the handling of data in the cluster before collecting it and visualizing it at a local level for analysis, as well as the future for merged usage of Spark and R.
Srisatish Ambati (@srisatish), cofounder and CEO of H2O.ai, picked up from there with a focus on open-source machine learning. “Data science is nothing but the search for truth,” he said. “You’re looking at a world that is going to have code as a commodity.” Ambati added that while “data as a whole is a true equalizer … looking for value across different ecosystems is what we’re really focused on.”
He continued: “The companies’ boundaries have broken. They’re not looking at product values; they’re looking at ecosystem values.” Between smart applications, bulk an streaming focus, Ambati felt: “AI is truly eating software. … this is the end of code in many ways.”
Environmental Sparks
Seth Dobrin, Ph.D., director of Digital Strategies at Monsanto Co., presented next. He described Monsanto as being more of a genetics and genomics company than an agricultural company, with “one of the most advanced genetics and genomics pipelines,” and “to support those things, we need to have data science tools and we need to leverage data science.”
Part of his presentation focused on the imbalance of resources devoted to feeding people who consume animal protein instead of vegetables, as well as how, along with “pivoting to become a digital agricultural company,” Monsanto is “selling digital service to our growers and helping them manage their farms better.” Some of the biggest utilities aiding Monsanto in achieving these goals was its use of Spark to simplify huge amounts of data by breaking them down into smaller blocks and thereby generating environmental classifications on data tables, as well as spatial and temporal data science.
Carrying on from that theme, Robbie Strickland, VP of Software Engineering at The Weather Company, an IBM Business, discussed the acquisition of The Weather Company by IBM, along with how the company “leverages Spark at The Weather Company to actually scale our data-processing and analytics workloads” to handle the “petabytes of data generated every single day.”
These processes at The Weather Channel’s data center impact energy trading, airplane routing, insurance and more, with its data aggregation converted to MapReduce, then using that output to inform other apps. Strickland also shared some of the company’s focus on and continued improvement of tools to lead development from prototype to functional implementation without costly revisions.
A wealth of thought
The closing panel of data scientists featured a number of field luminaries, with Nick Pentreath, principal engineer at the IBM Spark Technology Center; John Akred, founder and CTO of Silicon Valley Data Science, LLC; Todd Holloway, director of Content Science and Algorithms at Netflix; Matthew Conley, data scientist at Tesla Motors; Dr. Eitel J.M. Lauria, professor and director of Graduate Programs at the School of Computer Science and Mathematics at Marist College; and Siddha Ganju, data scientist and Mozilla Science Lab Member.
While the panel covered several topics during their fielding of questions from Gunnar and the audience, some points in particular were highlighted by the group, with Holloway noting that data science finally has its own categorization, instead of being ambiguously lumped in with other sciences.
Akred also made a heart-felt plea for enterprises to “please test your models before you put them into production!” He noted that the failure to do so regularly resulted in enormous wastes of money and other resources, as well as creating potential for physical harm in some instances.
On the topic of how people can become effective data scientists, Pentreath felt that the key was to be found in people’s own drive and enthusiasm but that the community of their work was also an important aspect. “You start with where you have some knowledge,” he said. “And open-source in general is a welcoming community.”
Giving what served as a summation of all of the speakers’ involvement, Holloway stated, “What Spark does for us is allows us to be experimental on a platform that we’re already using, and that’s really exciting.”
Stay tuned for the full video presentation, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Apache Spark Maker Community 2016.
Photo by SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU