UPDATED 18:57 EDT / MARCH 31 2016

NEWS

Machine learning on machine learning software: It’s closer than you think | #BigDataSV

As the tech world pivots on game-changing applications, data scientists rise to the occasion. Such is the case with Holden Karau, principal software engineer of Big Data at IBM and coauthor of Learning Spark.

When asked about the current renovations within Spark, Karau said she sees this time as an “opportunity to get rid of dead weight” by streamlining certain processes. For example, she cited getting functional and relative queries to talk to each other within Spark.

Two area of expansion include sequencing and machine learning. Karau noted another “massive expansion” in getting other applications to run on top of Spark during an interview with John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE from the SiliconANGLE Media team, during the BigDataSV 2016 event in San Jose, California, where theCUBE is celebrating #BigDataWeek, including news and events from the #StrataHadoop conference.

The three self-described tech geeks discussed the advances with Spark since the bandwagon effect has kicked in. Karau predicted that machine learning on machine learning software will arrive sooner than Gilbert’s conservative five-year estimate. While she didn’t give a specific time frame, Karau stated emphatically that it is “closer than five years.”

How data science is changing software dynamics

Karau conferred with Furrier and Gilbert about several aspects of data science and how it is changing software dynamics. One side project in particular stood out. Karau is working on a Spark validator that will help with “policing quality” in regards to algorithms within pipeline models. Pipeline models present challenges regarding working large scale and still being able to work with the Big Data interactively. When asked about getting data science to work on data science, Karau said the tech was “there-ish.”

In addition, Karau is working with her coauthor, Rachel Warren, on a new book called High Performance Spark. Karau spoke eloquently and candidly about sources of frustration in working with Spark pipeline issues, saying, “How do I save this damn thing?” However, when it comes to Spark, Karau literally wrote the book.

Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataSV 2016. And make sure to weigh in during theCUBE’s live coverage at the event by joining in on CrowdChat.

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU