

“Needle in a haystack” is one of the most commonly used phrases in discussions about Big Data. It’s a big messy job, all the more frustrating, because with a vast data lake, you know the answer is there if you could just figure out how to get at it. Write more code, combine operations, analyze across segments? It would certainly simplify our work if we could just tell our system what to do and let it figure out the how.
Michael Armbrust, lead developer of Spark SQL at Databricks, Inc., agrees and has devoted much of his recent work to making it happen.
Armbrust spoke to George Gilbert, host of theCUBE, from the SiliconANGLE Media team, about what the new data operation he calls declarative programming. “You’re telling the system what you want to do, but not necessarily how to do it. Once you have that language for saying what you want to do, now the system has all of this new flexibility for kind of trying different ways to accomplish that and can actually explore the space and find the best way to do it,” he said.
He explained that this had its roots in SQL and has evolved to a higher level with DataFrames and Spark 2.0.
Armbrust also spoke about Spark 2.0’s ability to cut down on coding by making batch code convertible to continuous.
“Instead of having to rewrite it and rethink it in terms of streaming, you take exactly the same code you wrote for your batch job, and you can turn it into a continuous application that incrementally computes the answer as new data arrives,” he said.
Watch the complete video interviews below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Innovation Day at Databricks.
THANK YOU