Amazon debuts Inferentia, a custom machine learning prediction chip
In another sign of Amazon.com Inc.’s broad ambitions in cloud computing, the company’s cloud company today debuted a new processor chip designed for machine learning.
The chip, called Inferentia, will be available via Amazon Web Service Inc.’s EC2 computing service as well as its SageMaker AI service and Amazon Elastic Inference, a new service also announced today. It’s designed to speed the process of inference, or predictions, carried out by machine learning models, helping power services such as Amazon’s Alexa and self-driving cars.
Designed by Annapurna Labs, the chip design firm Amazon bought a few years ago, it’s claimed to be low-latency and cost-effective compared with graphics processing units, the chips mostly from Nvidia Corp. that have been the go-to chips for machine learning in recent years.
Inferentia is expected to be available next year. It was announced briefly by AWS Chief Executive Andy Jassy (pictured) during his keynote this morning at the company’s re:Invent conference in Las Vegas, but he provided few details on the design or specifications, though it will work with multiple data types and all the major frameworks such as PyTorch and TensorFlow and MXNet. It also will provide hundreds of tera operations per second or TOPS and can be used together to drive thousands of TOPS.
The chip is AWS’ second announced in as many days. On Monday evening, the company announced a processor called Graviton that’s available to its cloud customers through AWS’ EC2 cloud compute service. It’s based on the Arm architecture used in smartphones, network routers and a wide variety of other devices, and it’s gradually finding its way into computer servers such as the ones AWS designs for use in its massive data centers.
“AWS’s announcement that it will be developing its own ML inference chip that supports many frameworks is huge,” said Patrick Moorhead, president and principal analyst at Moor Insights & Strategy. “Unlike Google Cloud, the AWS service will be widely available and will be elastic. For inference, AWS now offers CPUs, GPUs, FPGAs and now its own ASIC.”
The past few years has seen a flurry of new chips optimized for certain applications, in particular machine learning and AI. Google LLC, for instance, offers cloud access to its custom Tensor Processing Unit chip. One reason for this resurgence of chip design is the need for so-called hyperscaler companies with large numbers of huge data centers to tweak every last bit of efficiency from their hardware.
All this has left data center leader Intel Corp. on the defensive, and it has bought a number of companies such as Altera Inc. and Movidius Inc. to add new chip designs and expertise to its core X86 lines. It has also tweaked X86 chips such as its current Xeon line to handle machine learning and other tasks better.
Amazon also announced Elastic Inference, which is a deep learning inference acceleration service powered by GPUs. Jassy said it can save up to 75 percent of the cost of doing inference by offering the option to provision only as much of AWS compute instances as are needed.
In addition, AWS debuted a number of AI-related services and products, including DeepRacer (above), an autonomous model car intended to be used by developers to learn about a branch of machine learning called reinforcement learning. It’s available for pre-order at $249.
Photo: Robert Hof/SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.
We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.