

The amount of labor that goes into machine learning is pretty daunting. And despite the obstacles tackled by open source contributions, some of the most hyped machine learning frameworks merely skim the surface of the work to be done. Does a technology exist that can collapse the sprawling processes of machine learning, from data ingestion to training to edge inferencing?
Today, there is growing focus on choosing the right machine learning framework, according to David Aronchick (pictured), head of open-source machine learning strategy at Microsoft Corp. The considered frameworks include TensorFlow, Microsoft Cognitive Toolkit and Apache MXNet, to name a few. They’re far from useless — but they may not yet warrant all the attention they get.
“The reality is, when you look at the overall landscape, that’s just 5 percent of the work that the average data scientist goes through,” Aronchick said. The remaining 95 percent is a big pile of rusty nuts and bolts that should be abstracted away already, he added.
That is the aim of Kubeflow — an open-source project for deploying and managing an machine learning stack on Kubernetes, an open-source platform for orchestrating containers, a virtualized method for running distributed applications.
Aronchick spoke with John Furrier and Stu Miniman, co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the recent KubeCon + CloudNativeCon conference in Seattle. They discussed what’s cooking in open source and academia to shorten machine learning cycle times.
The grunt work we ask data scientists to do today would shock a lot of people in more abstracted areas of information technology. “We’re asking data scientists, ML engineers, to think about how to provision pods, how to work on drivers, how to do all these very, very low-level things,” Aronchick said.
Aronchick believes academic researchers will discover ways to reduce amounts of data and labor needed to train models. However, this may not solve all data-transport issues. Operations across multicloud environments call for Kubernetes’ abstraction layer, he added.
“The reality is, you can’t beat the speed of light,” he said. “If I have a petabyte of data here, it’s going to take a long time to move it over there. I think you’re ultimately going to have models and training and inference move to many, many different locations.”
Kubernetes and Kubeflow offer high-level abstraction, so a data scientist can work on a model, see how it works, hit a button, and provision it on all the machines necessary.
No, Kubernetes doesn’t spread an application across Azure, Google Cloud Platform and Amazon Web Services Inc. like cream cheese. “What you really want to do is have isolated deployments to each place that enables you, in a single button, to deploy to all three of these locations,” Aronchick said.
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s extensive coverage of KubeCon + CloudNativeCon:
THANK YOU