

DevOps that have worked well in other areas of information technology can’t hack it in big data, according to Ash Munshi (pictured), chief executive officer of Pepperdata Inc.
“While agile and all that still works, the tools don’t work,” Munshi said in an interview at this year’s Spark Summit in San Francisco, California.
Tightening the loop leading from development to operations is trickier due to the mass of data in the middle and the number of machines computing it, Munshi told David Goad (@davidgoad) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. (* Disclosure below.)
There could be thousands of machines working to solve one problem. This obviously points to infrastructure abstraction and virtualization as possible fixes, but, by themselves, they’re half-baked solutions, Munshi stated.
Apache Spark’s data engine abstracts the paradigm developers write against, which is wonderful, Munshi explained, since it simplifies the code-writing process.
“The problem when you abstract is, what does that abstraction do down in the hardware, and where am I losing performance?” he said.
Spark’s user interface provides some information about processor and memory resource consumption and the state of the garbage collector, Munshi stated. “What it doesn’t do is give you a time-series view of what’s going on,” he said.
With blow-by-blow visibility, Pepperdata’s recently announced Code Analyzer for Apache Spark allows users to pinpoint performance issues in their code. A second, complementary Pepperdata release is the Application Profiler. This tool analyzes all data from completed applications in the Spark History Server. If it discovers faulty executors, it highlights them so developers can click on them for explanations and suggested cures.
Pepperdata customers have discovered that this is a useful prognosticator as well, Munshi stated.
“If the Application Profiler comes back and says, ‘Everything is green; there’s no critical issues there,’ then they’re saying, ‘OK, fine. Put it on the production cluster,'” he said.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017. (* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
THANK YOU