UPDATED 00:01 EDT / AUGUST 08 2017

EMERGING TECH

IBM says its latest breakthrough could dramatically improve deep learning speed

Processor technology has advanced so much in recent years that a device the size of a thumb drive can now be used to power a neural network. But companies often struggle to take full advantage of the computational power at their disposal because of a fundamental challenge in implementing large-scale artificial intelligence models.

The issue has to do with scalability, which IBM Corp. is tackling with a software library called Distributed Deep Learning, or DDL, that it’s unveiling Tuesday. Deep learning is a subset of machine learning that attempts to teach computers to learn in roughly the same way humans do. For example, people don’t recognize a dog by parsing the fact that a creature has four legs, a snout and a tail. Once they know what a dog looks like, they will always differentiate it from a cat. Deep learning attempts to duplicate that approach in software. 

Most deep learning frameworks support the ability to scale a large model across multiple servers, and many now support graphical processing units, but the way they gather and synchronize findings leaves much to be desired, said Hillery Hunter (pictured), a director at the company’s research group. 

A synchronized workflow

A deep learning model running atop a cluster of computers enhanced by graphics processing unit chips has millions of distributed and interconnected processing elements whose role is roughly analogous to neurons in the human brain. These artificial neurons work together to process information just like their organic counterparts, with each one handling a small portion of the data. When a node completes a calculation, the results are synchronized across the rest of the neural network to help coordinate the work.

That’s where the bottleneck is, according to IBM. The faster the GPU on which an artificial neuron is deployed runs, the quicker it completes calculations, meaning the results have to be synced more often. Thanks to the way that AI clusters are built, the same applies if the number of chips in an environment increases. But deep learning frameworks can only sync data so often.

As a result, processing speed is limited by the rate at which data can travel between GPUs. DDL uses a so-called multiring communications algorithm to change the balance. The library modifies the network paths through which information is sent to achieve an “optimal” balance between latency and bandwidth, making communications much less of a bottleneck.

Record-breaking performance

In an internal test, IBM deployed DDL on a cluster with several hundred GPUs and set out to process 7.5 million images from a popular research data set, assigning each into one or more of 22,000 categories. The model accurately recognized 33.8 percent of the objects after seven hours of training, handily beating the previous 29.8 percent record that was set by Microsoft Corp. following 10 days of training.

If a 4 percent improvement sounds small, and the overall success rate low, it’s because the model is meant to be far more complex than one that would be encountered in the real world, said Sumit Gupta, vice president of  high-performance computing and artificial intelligence at IBM. That makes progress incremental, he said, noting that Microsoft’s previous record was only 0.8 percent better than the record before it. “The benchmark is designed to stress deep learning software in order to prove that researchers really have built something better,” he said.

DDL is particularly useful in the training phase of AI development, which is one of single biggest time sinks in the entire project lifecycle. A model must sometimes spend weeks or months processing sample data before it becomes accurate enough for production use. IBM claims that its library can shorten the process to just a few hours, in some cases. “If it takes 16 days to train a model how to recognize a new credit card, that’s 16 days that you’re losing money,” Gupta said.

Deep learning is also useful in medical scenarios such as tissue analysis, where long training times can be a matter of life or death, he said. There are other benefits as well. If a deep learning model can be trained in hours instead of weeks, a company’s AI infrastructure is freed up for other projects sooner and more work gets done.

In another demonstration, IBM’s DDL was shown to achieve scaling efficiency of 95 percent compared with the 89 percent recorded by Facebook Inc. during an earlier trial. Both tests used the same sample data.

IBM said DDL helps companies train their models with a speed and scope that wasn’t previously possible due to time constraints. It’s connecting the DDL library to all popular deep-learning frameworks, including TensorFlow, Caffe, Chainer, Torch and Theano, under an open-source license.

The company is also building the library into its own PowerAI deep learning toolkit platform, which is available in both free and paid enterprise editions, as well as on the Nimbix Minsky Power Cloud. “We’ve democratized it and brought it to everyone through PowerAI,” Gupta said.

Paul Gillin contributed to this story. 

Image: IBM

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU