UPDATED 16:20 EST / JULY 25 2017

BIG DATA

Could Apache Spark become a universal computation engine?

Spark Summit keynotes are known for their surprises, and this year the stand-out changes were in data streaming, with sub-millisecond times predicted for some workloads. With multiple avenues open for potential success, the community is watching as Spark matures to fulfill the promise of what it could be: But does that promise include becoming a database?

Exploring the gap between theoretical possibilities and reality, Matthew Hunt (pictured) technologist at Bloomberg LP, discussed the maturation of Spark with George Gilbert (@ggilbert41) and David Goad (@davidgoad), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during this year’s Spark Summit event in San Francisco, California.

As a pioneer of streaming media, Bloomberg has a long history developing apps for news and finance and has developed its own relational database, ComDB2. “Everyone needs a database,” Hunt said, adding that most companies do not have the resources to develop their own. This leads to the question: Can Spark become a database?

Hunt believes that Spark has the promise to become a Universal Computation Engine. Describing a universal system as having distributed file store, database with transactional semantics, extensible analytics and the ability to stream data in, he asked, “how close can you come to that?”

Maturity means real-world use

Although the dream might be a universal system, the more practical question is how to make Spark and other databases work well together.

“If you have to master 5,000 skills and 200 different products, that’s a huge impediment for real-world usage,” said Hunt, who sees practical usage coalescing around a smaller set of options.

Hunt predicted that Apache Arrow, which powers columnar in-memory analytics, is about to explode because “it lets you connect these systems radically more efficiently in a standardized way.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017(* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Video by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU