China’s Baidu to open-source its deep learning AI platform
The Chinese Internet giant Baidu Inc. has been making big progress in applying deep learning neural networks to improve image recognition, language translation, search ranking and click prediction in advertising. Now, it’s going to give a lot of it away.
The company, often called “China’s Google,” will announce Thursday at the annual Baidu World conference in Beijing that it’s offering the artificial intelligence software that its own engineers have been using for years as open source. Available in an early version on GitHub with full availability Sept. 30, it’s code-named PaddlePaddle, for PArallel Distributed Deep LEarning.
Deep learning is the branch of machine learning that attempts to emulate the way neurons work in the human brain to find patterns in data representing sounds, images, and other data. Google, Facebook, Microsoft, IBM and other companies have also been making big breakthroughs thanks to the ability to pump massive amounts of data into these artificial neural networks.
The announcement follows the open-sourcing in the last two years of other machine intelligence and deep learning tools such as Torch and machine-vision technology from Facebook, TensorFlow from Google, Computation Network Tool Kit (CNTK) from Microsoft and DSSTNE from Amazon.com, as well as independent open source frameworks such as Caffe. Baidu also has open-sourced other pieces of its AI code. But Xu Wei, the Baidu distinguished scientist who led PaddlePaddle’s development, said this software is intended for broader use even by programmers who aren’t experts in deep learning, which involves painstaking training of software models.
“You don’t need to be an expert to quickly apply this to your project,” Xu said in an interview. “You don’t worry about writing math formulas or how to handle data tasks.” (Indeed, the playful doubling of the original code-name is intended to convey that it’s easier to use than rival software.)
It clearly requires a certain base of knowledge, but Xu said PaddlePaddle requires significantly less code than some alternatives. For instance, a machine translation model built on it needs about a quarter of the specially written code than other AI platforms require, he claimed. And existing models can be applied to new problems without requiring complex equations. “We want to help the people actually working on a product,” rather than chiefly researchers, Xu said.
From a business point of view, it may seem odd to give away the keys to the deep learning kingdom. But there’s method behind this apparent madness, and it’s a little different from other open-source plays, which look to attract developers to create apps for a platform.
In the case of open-sourcing of AI algorithms, and specifically at Baidu, the aim is to draw more deep learning engineers, which are in very high demand today. “ People will recognize Baidu as a leader, so it will attract more talent,” said Xu.
What’s more, the algorithms themselves, which are often already shared via academic papers in the relatively small community of deep learning researchers, aren’t really a competitive differentiator. Much more important from a competitive point of view is the data these companies collect. “The breakthroughs are much more in how you gather and use training data sets,” said Peter Christy, research director at 451 Research.
As a result, Baidu chief Scientist Andrew Ng said in a recent interview at Baidu’s Silicon Valley headquarters in Sunnyvale, “Data is a more defensive barrier.”
AI is about to break out of dedicated corporate departments throughout companies, said Ng, who headed the Google Brain AI project a few years ago, potentially producing more new AI talent in the process. PaddlePaddle looks to be aimed at spreading more AI talent in the same way that corporate “electricity departments” in the early days of that technology gave way to electricity permeating every part of organizations.
Open-sourcing AI software is a potential way to form a community to help spread it to more people, Christy said. “Sharing code is a great and cost-effective way of enabling a discussion about the broader topic, and the value may well accrue at that higher level, not just the code.”
Like other AI systems, PaddlePaddle runs on large clusters of computers, including those that use graphics processing units (GPUs), whose parallel processing is especially adept at running deep learning algorithms.
Photo from Baidu
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.