UPDATED 21:50 EDT / OCTOBER 16 2020

Q&A: How to train your data at exascale speed

Having data and having insights are two very different things. To transform data into information that can actually help drive better decisions and scientific breakthroughs is a proactive task. And a daunting one. So what are the steps data scientists recommend to turn that stagnant data lake into a sparkling flow of insights?

“Step back from the data questions; the infrastructure questions; all of these technical questions that can seem very challenging to navigate,” said Arti Garg (pictured), head of AI solutions and technologies at Hewlett Packard Enterprise. “And first ask: What problems am I trying to solve? It’s really no different than any other type of decision you might make in an organization.”

Garg spoke with Jeff Frick, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during Exascale Day 2020. They discussed the challenges facing modern data scientists, the moral implications of giving artificial intelligence autonomous control and how exascale computing will change the field of data science. Questions and answers have been condensed for clarity. (* Disclosure below.)

It’s all about asking the right questions. You’ve got to shape the data to the question, and then you’ve got to start to build the algorithm to answer that question. How should people think when they’re actually building algorithms and training algorithms?

Garg: I like to think about AI solutions as they get deployed being part of a workflow. And the workflow has multiple stages associated with it. The first stage being generating your data. Then starting to prepare and explore your data. Then building models for your data. But sometimes what we don’t always think about are the next two phases. First is deploying whatever model or AI solution you’ve developed. What will that really take? Is it going to live in a secure and compliant ecosystem? Or as we’re seeing more applications on the edge, is it actually going to live in an outdoor ecosystem?

Then, finally, who’s going to use it and how are they going to drive value from it? Because it could be that your AI solution doesn’t work because you don’t have the right dashboard that highlights and visualizes the data for the decision-maker who will benefit from it. I think it’s important to sort of think through all of these stages upfront. Think through what some of the biggest challenges you might encounter are so that you’re prepared when you meet them, and you can refine and iterate along the way and even upfront tweak the question you’re asking.

Oct. 18 is Exascale Day, celebrating high-performance computing making the leap from petascale trillions (10¹⁵) to exascale quintillions (10¹⁸) in floating-point operations per second. Can you share your thoughts on being a data scientist and suddenly having all this massive compute power at your disposal?

Garg: Only time will tell exactly all of the things that we’ll be able to unlock from these new massive computing capabilities that we’re going to have. But a couple of things that I’m very excited about are that in addition to these very large investments in large supercomputers, exascale supercomputers, we’re also seeing investment in the other types of scientific instruments driving pharmaceutical drug discovery.

I’m talking about what they call light sources which shoot X-rays at molecules and allow you to really understand their structure. Historically, you would go take your molecule to one of these light sources and you shoot your X-rays at it and you would generate just masses and masses of data — terabytes of data with each shot. Understanding what you were looking at was a long process of getting computing time and analyzing the data. [With exascale computing,] we’re on the precipice of being able to do that, if not in real time much closer to real time.

And I don’t really know what happens if instead of coming up with a few molecules, taking them, studying them, and then saying maybe I need to do something different, I can do it while I’m still running my instrument. It’s very exciting from the perspective of someone who’s got a scientific background who likes using large data sets.

With AI you build the algorithm, it’s in a box, it runs, and it kicks out an answer. And one of the things that people talk about is having explainable AI — the idea that we should be able to go in and pull that algorithm apart to know why the AI came out with the answer that it did. But that’s not simple. Is explainable AI even possible?

Garg: It’s obviously a question that’s on a lot of people’s minds these days. Really thinking about what do we mean by explainable AI also requires us to think about what do we mean by AI? These days AI is often used synonymously with deep learning, which is a particular type of algorithm that is not very analytical at its core. And what I mean by that is, other types of statistical machine learning models have some underlying theory of what the population of data is that you’re studying. Whereas deep learning doesn’t; it just learns whatever pattern is sitting in front of it.

So there is a sense in which if you look at other types of algorithms, they are inherently explainable because you’re choosing your algorithm based on what you think is the ground truth about the population you’re studying. I think [the question is] are we going to get to explainable deep learning? And it’s challenging because you’re always going to be in a position where deep learning is designed to just be as flexible as possible.

I don’t want to say I know what’s going to happen 50 years from now, but I think it’ll take a little while to get to the point where you don’t have to apply some subject matter understanding and some human judgment to what an algorithm is putting out.

Let’s talk about data science and ethics. There’s an inherent problem with data collection that may be used for something else down the road. So can you share your top-level ethical take on how data scientists, specifically, and then ultimately more business practitioners and other people that don’t carry that title, need to be thinking about ethics?

Garg: I think that the best we can do is take a very multifaceted and also vigilant approach to it. As you start to collect data or build solutions, try to think through who are all the people who might use it? And what are the possible ways in which it could be misused?

I also encourage people to think backward. What were the biases in place when the data were collected? Historical records reflect historical biases in our systems. There are limits to how much you can correct for previous biases, but there are some ways to do it. However, you can’t do it if you’re not thinking about it. So, that is important at the outset of developing solutions.

Equally important is putting in the systems to maintain vigilance around biases. Don’t move to autonomy before you know what potential new errors or new biases you might introduce into the world. And have systems in place to constantly ask these questions: “Am I perpetuating things I don’t want to perpetuate? Or how can I correct for them?” And be willing to scrap your system and start from scratch if you need to.

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Exascale Day 2020. (* Disclosure: TheCUBE is a paid media partner for Exascale Day 2020. Neither Hewlett Packard Enterprise, the sponsor for theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Q&A: How to train your data at exascale speed

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Q&A: How to train your data at exascale speed

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Cookies