

Open-source technology is paving the way to the future of affordable and flexible IT, and the Apache Spark open-source processing engine is no exception. There is even a Spark components page where users can share useful tools and technology.
“Even vendors are sharing,” said Holden Karau, principal software engineer at IBM. This enables much wider collaboration throughout the community, as well as narrower collaboration between friends and colleagues. “If you have a notebook and share it with your friend, you can work together more collaboratively. A lot of companies are building notebook solutions,” added Karau.
Karau was interviewed by Dave Vellante (@dvellante) and Peter Burris (@plburris), hosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016 in New York, NY.
Another standout feature of Spark is its range of complexity. It allows users who may not have much knowledge of Python or Java to be able to build what they need without that coding ability.
At the same time, if users of Spark do understand coding, they can also use that knowledge to their benefit to create exactly what they want. “I think Spark does a good job of being user friendly. With Spark it’s much simpler and exposed in ways people are already used to working with their data,” said Karau.
Another area Spark and other platforms are beginning to dive into is machine learning. The traditional method is to down-sample, but this isn’t the most efficient or thorough method. Machine learning allows a much wider, agile way of doing things.
“When you move people to a laptop, you can train an algorithm to recommend datasets to people,” said Karau. “The combination of notebooks and Spark means data scientists can directly apply data during the exploration phase.” It speeds up the process by eliminating the need to consult coworkers for their data sets or do manual searches. And that is very powerful, with strong implications for the future, Karau concluded.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataNYC 2016.
THANK YOU