

As Spark pulls ahead as the data tool of choice for businesses, many software vendors will be dissecting their latest releases to discover what they’ve done differently. How is the open-source tool able to handle more data tasks with greater speed and accuracy? Companies out to mimic Spark will find they need to put software concerns aside and go deeper into the hardware.
Patrick Wendell, cofounder and VP of engineering at Databricks, Inc., said that Spark’s attack on Big Data problems has as much to do with silicon as software. He told George Gilbert, host of theCUBE, from the SiliconANGLE Media team, about how Spark utilized hardware to improve its data tools with its Tungsten Project.
“The whole Tungsten initiative is about getting closer to the hardware in some sense,” Wendell said. “Spark is written on the JVM [Java Virtual Machine] — that’s a slightly higher abstraction — and what we’re trying to do is eek every bit of performance out of the underlying hardware,” he said, adding that the way in which cores are utilized was an area of innovation.
He said that the optimization of hardware gives the software more to work with up the stack. And, he stated that a software logical optimizer “can sit there and say, ‘Hold on, I’ve analyzed this query,’ and we can actually avoid reading a whole segment of the data, speed up this query by 100x just by using some kind of deductive reasoning about what’s happening inside the query.”
Watch the complete video interviews below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Innovation Day at Databricks.
THANK YOU