UPDATED 15:14 EDT / JANUARY 25 2019

AI

Kubeflow to the rescue: ML toolkit offers hope for data science and deep learning

Data scientists in the machine learning community have a growing skills gap, and it will take some serious technology to fix it.

Google Cloud executive Rajen Sheth recently voiced his agreement with estimates that the number of machine learning engineers capable of moving deep learning from concept to production equals a few thousand. But there are millions of data scientists and significantly more developers. How can the gap be closed?

The answer may be found in large part to the current activity taking place among major cloud players and key figures in the open-source community focused on a relatively new, yet vitally important project — Kubeflow.

The Kubeflow project, co-founded by David Aronchick (pictured) in 2017 at Google LLC, provided a toolkit so data scientists could run machine learning jobs more easily on Kubernetes clusters without a lot of extra work and adaptation.

“When it gets to really complex apps, like machine learning, you’re able to do that at an even higher-level using constructs like Kubeflow,” said Aronchick, as he described how data scientists can quickly create a model. “When they’re done they hit a button, and it will provision out all the machines necessary, all of the drivers, spin it up, run that training job, bring it back, and shut everything down.”

Aronchick, who became head of open-source machine learning strategy at Microsoft Corp. in November, spoke with John Furrier (@furrier) and Stu Miniman (@stu), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the recent KubeCon + CloudNativeCon conference in Seattle. They discussed the impact of Kubeflow on workload portability, recent commercial contributions to support machine learning deployment, the importance of executing data training models at the edge, a growing need for improving data efficiency, and how corporate and open-source contributions are bringing Kubernetes to a new level of maturity.

This week, theCUBE features David Aronchick as its Guest of the Week.

Reaching portability and scalability

Kubeflow is a natural outgrowth of the Kubernetes movement, where the popular container orchestration tool has made it easier to manage distributed workloads across the enterprise. It is designed to deploy on the Kubernetes stack with the goal of making the distribution of machine learning workloads portable and scalable across multiple nodes.

Workload portability is an essential ingredient in a world where enterprises are moving jobs between multiple clouds, and machine learning could help navigate an increasingly complicated environment. A survey of cloud computing trends conducted by RightScale Inc. last year found that 81 percent of enterprises had a multicloud strategy.

“I can’t overstate how valuable that portability is,” Aronchick said. “Kubernetes lets you compose these highly complex pipelines together that lets you do real training anywhere.”

New platform from Intel

The pace of innovation to facilitate precisely the kind of portable model that Aronchick describes is beginning to pick up. This month, Intel Corp. released a new platform — Nauta — designed to facilitate deep learning at scale for data scientists and developers.

Nauta will support both batch and streaming inference for model testing, while facilitating the use of Kubernetes to manage orchestration of machine learning pipelines in hybrid environments. Intel is a major code contributor to Kubeflow, and Nauta is built on the machine learning toolkit, according to statements from company executives during an artificial intelligence gathering in Munich, Germany, this month.

The latest news highlighted the need to execute data training models in a variety of locales. “You’re ultimately going to have models and training and inference move to many different locations,” Aronchick explained. “So you’ll do inference at the edge on my phone or on a little Bluetooth device in the corner of my house, saying whether it’s too hot or too cold. We’re going to need that kind of intelligence, and we’re going to do that kind of training and collection at the edge.”

Coping with data avalanche

While Intel’s latest announcement provides another boost for data scientists seeking to deploy machine learning using Kubeflow-based tools, engineers are still struggling with issues involving the sheer amount of data that must be processed and analyzed.

The solution may lie in academic and commercial research that is contributing advances to applications for artificial intelligence. If machines can intuitively discern anomalies faster than humans, the potential grows for training models using less data rather than more.

Computational researchers at Vicarious Inc. have developed a model that trains computers to decipher CAPTCHAs, those jumbles of letters and numbers used by many websites to determine whether the user is actually a human. Their research has enabled computers to reach 67 percent accuracy using less data, according to a recent report in the “Harvard Business Review,” a success rate that may indeed already surpass the ability of humans to decipher the often-confounding images.

There is still hope that a move toward needing less data will help the cause for machine learning and data scientists, according to Aronchick, as the scale has made it difficult to troubleshoot problems. “It’s not a matter of whether you’re able to process it; you are,” Aronchick said. “But it’s so easy to get lost, to get caught in little anomalies. If you have a petabyte of data, and a megabit of it is causing your model to go sideways, that’s really hard to detect.”

Dual role for cloud providers

Aronchick’s new role with Microsoft represents a return for the open-source technology veteran. The Dartmouth graduate originally worked for the software giant for six years, starting in 2001, when he handled a variety of project roles.

That was followed by positions at Amazon and Google, where he played key roles in Kubernetes and Kubeflow. The experience has given Aronchick a perspective on the dual role that is evolving among major cloud providers, such as Microsoft Azure and Google Cloud, in providing commercial and open-source contributed tools.

“Much like Kubernetes has both a commercial offering and an open-source offering, I think that all of the major cloud providers will have that kind of duality,” Aronchick said. “They’ll work in open-source and you can measure how many contributions and the number of open-source projects. But then they’ll also have hosted other versions that make it easier for customers to migrate their data and adopt some of these new solutions.”

In December, Sourced Technologies, S.L., a company specializing in machine learning for large-scale code analysis, released a report that documented signs of maturity for the four-year Kubernetes project. These included reaching 2 million lines of code and stabilization of the API.

With age comes power, and ancillary tools like Kubeflow are only enhancing the opportunities being driven by Kubernetes, including advances in machine learning deployment and beyond.

“You’re seeing Kubernetes become boring, and that is incredibly powerful,” Aronchick said. “People are building enormous businesses on top of it.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s extensive coverage of KubeCon + CloudNativeCon:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU