UPDATED 18:57 EDT / MARCH 31 2016

NEWS

Machine learning on machine learning software: It’s closer than you think | #BigDataSV

by Amber Johnson

As the tech world pivots on game-changing applications, data scientists rise to the occasion. Such is the case with Holden Karau, principal software engineer of Big Data at IBM and coauthor of Learning Spark.

When asked about the current renovations within Spark, Karau said she sees this time as an “opportunity to get rid of dead weight” by streamlining certain processes. For example, she cited getting functional and relative queries to talk to each other within Spark.

Two area of expansion include sequencing and machine learning. Karau noted another “massive expansion” in getting other applications to run on top of Spark during an interview with John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE from the SiliconANGLE Media team, during the BigDataSV 2016 event in San Jose, California, where theCUBE is celebrating #BigDataWeek, including news and events from the #StrataHadoop conference.

The three self-described tech geeks discussed the advances with Spark since the bandwagon effect has kicked in. Karau predicted that machine learning on machine learning software will arrive sooner than Gilbert’s conservative five-year estimate. While she didn’t give a specific time frame, Karau stated emphatically that it is “closer than five years.”

How data science is changing software dynamics

Karau conferred with Furrier and Gilbert about several aspects of data science and how it is changing software dynamics. One side project in particular stood out. Karau is working on a Spark validator that will help with “policing quality” in regards to algorithms within pipeline models. Pipeline models present challenges regarding working large scale and still being able to work with the Big Data interactively. When asked about getting data science to work on data science, Karau said the tech was “there-ish.”

In addition, Karau is working with her coauthor, Rachel Warren, on a new book called High Performance Spark. Karau spoke eloquently and candidly about sources of frustration in working with Spark pipeline issues, saying, “How do I save this damn thing?” However, when it comes to Spark, Karau literally wrote the book.

Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataSV 2016. And make sure to weigh in during theCUBE’s live coverage at the event by joining in on CrowdChat.

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.