UPDATED 13:41 EST / SEPTEMBER 29 2016

NEWS

Collaboration and machine learning with Spark | #BigDataNYC

by Brittany Greaner

Open-source technology is paving the way to the future of affordable and flexible IT, and the Apache Spark open-source processing engine is no exception. There is even a Spark components page where users can share useful tools and technology.

“Even vendors are sharing,” said Holden Karau, principal software engineer at IBM. This enables much wider collaboration throughout the community, as well as narrower collaboration between friends and colleagues. “If you have a notebook and share it with your friend, you can work together more collaboratively. A lot of companies are building notebook solutions,” added Karau.

Karau was interviewed by Dave Vellante (@dvellante) and Peter Burris (@plburris), hosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016 in New York, NY.

Spark’s range of complexity

Another standout feature of Spark is its range of complexity. It allows users who may not have much knowledge of Python or Java to be able to build what they need without that coding ability.

At the same time, if users of Spark do understand coding, they can also use that knowledge to their benefit to create exactly what they want. “I think Spark does a good job of being user friendly. With Spark it’s much simpler and exposed in ways people are already used to working with their data,” said Karau.

Machine learning

Another area Spark and other platforms are beginning to dive into is machine learning. The traditional method is to down-sample, but this isn’t the most efficient or thorough method. Machine learning allows a much wider, agile way of doing things.

“When you move people to a laptop, you can train an algorithm to recommend datasets to people,” said Karau. “The combination of notebooks and Spark means data scientists can directly apply data during the exploration phase.” It speeds up the process by eliminating the need to consult coworkers for their data sets or do manual searches. And that is very powerful, with strong implications for the future, Karau concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataNYC 2016.

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.