

As the tech world pivots on game-changing applications, data scientists rise to the occasion. Such is the case with Holden Karau, principal software engineer of Big Data at IBM and coauthor of Learning Spark.
When asked about the current renovations within Spark, Karau said she sees this time as an “opportunity to get rid of dead weight” by streamlining certain processes. For example, she cited getting functional and relative queries to talk to each other within Spark.
Two area of expansion include sequencing and machine learning. Karau noted another “massive expansion” in getting other applications to run on top of Spark during an interview with John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE from the SiliconANGLE Media team, during the BigDataSV 2016 event in San Jose, California, where theCUBE is celebrating #BigDataWeek, including news and events from the #StrataHadoop conference.
The three self-described tech geeks discussed the advances with Spark since the bandwagon effect has kicked in. Karau predicted that machine learning on machine learning software will arrive sooner than Gilbert’s conservative five-year estimate. While she didn’t give a specific time frame, Karau stated emphatically that it is “closer than five years.”
Karau conferred with Furrier and Gilbert about several aspects of data science and how it is changing software dynamics. One side project in particular stood out. Karau is working on a Spark validator that will help with “policing quality” in regards to algorithms within pipeline models. Pipeline models present challenges regarding working large scale and still being able to work with the Big Data interactively. When asked about getting data science to work on data science, Karau said the tech was “there-ish.”
In addition, Karau is working with her coauthor, Rachel Warren, on a new book called High Performance Spark. Karau spoke eloquently and candidly about sources of frustration in working with Spark pipeline issues, saying, “How do I save this damn thing?” However, when it comes to Spark, Karau literally wrote the book.
Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataSV 2016. And make sure to weigh in during theCUBE’s live coverage at the event by joining in on CrowdChat.
THANK YOU