China’s Baidu to open-source its deep learning AI platform
The Chinese Internet giant Baidu Inc. has been making big progress in applying deep learning neural networks to improve image recognition, language translation, search ranking and click prediction in advertising. Now, it’s going to give a lot of it away.
The company, often called “China’s Google,” will announce Thursday at the annual Baidu World conference in Beijing that it’s offering the artificial intelligence software that its own engineers have been using for years as open source. Available in an early version on GitHub with full availability Sept. 30, it’s code-named PaddlePaddle, for PArallel Distributed Deep LEarning.
Deep learning is the branch of machine learning that attempts to emulate the way neurons work in the human brain to find patterns in data representing sounds, images, and other data. Google, Facebook, Microsoft, IBM and other companies have also been making big breakthroughs thanks to the ability to pump massive amounts of data into these artificial neural networks.
The announcement follows the open-sourcing in the last two years of other machine intelligence and deep learning tools such as Torch and machine-vision technology from Facebook, TensorFlow from Google, Computation Network Tool Kit (CNTK) from Microsoft and DSSTNE from Amazon.com, as well as independent open source frameworks such as Caffe. Baidu also has open-sourced other pieces of its AI code. But Xu Wei, the Baidu distinguished scientist who led PaddlePaddle’s development, said this software is intended for broader use even by programmers who aren’t experts in deep learning, which involves painstaking training of software models.
“You don’t need to be an expert to quickly apply this to your project,” Xu said in an interview. “You don’t worry about writing math formulas or how to handle data tasks.” (Indeed, the playful doubling of the original code-name is intended to convey that it’s easier to use than rival software.)
It clearly requires a certain base of knowledge, but Xu said PaddlePaddle requires significantly less code than some alternatives. For instance, a machine translation model built on it needs about a quarter of the specially written code than other AI platforms require, he claimed. And existing models can be applied to new problems without requiring complex equations. “We want to help the people actually working on a product,” rather than chiefly researchers, Xu said.
From a business point of view, it may seem odd to give away the keys to the deep learning kingdom. But there’s method behind this apparent madness, and it’s a little different from other open-source plays, which look to attract developers to create apps for a platform.
In the case of open-sourcing of AI algorithms, and specifically at Baidu, the aim is to draw more deep learning engineers, which are in very high demand today. “ People will recognize Baidu as a leader, so it will attract more talent,” said Xu.
What’s more, the algorithms themselves, which are often already shared via academic papers in the relatively small community of deep learning researchers, aren’t really a competitive differentiator. Much more important from a competitive point of view is the data these companies collect. “The breakthroughs are much more in how you gather and use training data sets,” said Peter Christy, research director at 451 Research.
As a result, Baidu chief Scientist Andrew Ng said in a recent interview at Baidu’s Silicon Valley headquarters in Sunnyvale, “Data is a more defensive barrier.”
AI is about to break out of dedicated corporate departments throughout companies, said Ng, who headed the Google Brain AI project a few years ago, potentially producing more new AI talent in the process. PaddlePaddle looks to be aimed at spreading more AI talent in the same way that corporate “electricity departments” in the early days of that technology gave way to electricity permeating every part of organizations.
Open-sourcing AI software is a potential way to form a community to help spread it to more people, Christy said. “Sharing code is a great and cost-effective way of enabling a discussion about the broader topic, and the value may well accrue at that higher level, not just the code.”
Like other AI systems, PaddlePaddle runs on large clusters of computers, including those that use graphics processing units (GPUs), whose parallel processing is especially adept at running deep learning algorithms.
Photo from Baidu
A message from John Furrier, co-founder of SiliconANGLE:
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.
We are holding our third cloud startup showcase on Sept. 22. Click here to join the free and open Startup Showcase event.
We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.