UPDATED 19:57 EST / JULY 20 2017

BIG DATA

Can big data DevOps see what abstraction is hiding?

DevOps that have worked well in other areas of information technology can’t hack it in big data, according to Ash Munshi (pictured), chief executive officer of Pepperdata Inc.

“While agile and all that still works, the tools don’t work,” Munshi said in an interview at this year’s Spark Summit in San Francisco, California.

Tightening the loop leading from development to operations is trickier due to the mass of data in the middle and the number of machines computing it, Munshi told David Goad (@davidgoad) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. (* Disclosure below.)

There could be thousands of machines working to solve one problem. This obviously points to infrastructure abstraction and virtualization as possible fixes, but, by themselves, they’re half-baked solutions, Munshi stated.

Apache Spark’s data engine abstracts the paradigm developers write against, which is wonderful, Munshi explained, since it simplifies the code-writing process.

“The problem when you abstract is, what does that abstraction do down in the hardware, and where am I losing performance?” he said.

Spark’s user interface provides some information about processor and memory resource consumption and the state of the garbage collector, Munshi stated. “What it doesn’t do is give you a time-series view of what’s going on,” he said.

Visibility ties up loose ends?

With blow-by-blow visibility, Pepperdata’s recently announced Code Analyzer for Apache Spark allows users to pinpoint performance issues in their code. A second, complementary Pepperdata release is the Application Profiler. This tool analyzes all data from completed applications in the Spark History Server. If it discovers faulty executors, it highlights them so developers can click on them for explanations and suggested cures.

Pepperdata customers have discovered that this is a useful prognosticator as well, Munshi stated.

“If the Application Profiler comes back and says, ‘Everything is green; there’s no critical issues there,’ then they’re saying, ‘OK, fine. Put it on the production cluster,'” he said.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017(* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU