UPDATED 13:06 EST / NOVEMBER 28 2018

Andy Jassy keynote, re:Invent 2018 CLOUD

Amazon debuts Inferentia, a custom machine learning prediction chip

In another sign of Amazon.com Inc.’s broad ambitions in cloud computing, the company’s cloud company today debuted a new processor chip designed for machine learning.

The chip, called Inferentia, will be available via Amazon Web Service Inc.’s EC2 computing service as well as its SageMaker AI service and Amazon Elastic Inference, a new service also announced today. It’s designed to speed the process of inference, or predictions, carried out by machine learning models, helping power services such as Amazon’s Alexa and self-driving cars.

Designed by Annapurna Labs, the chip design firm Amazon bought a few years ago, it’s claimed to be low-latency and cost-effective compared with graphics processing units, the chips mostly from Nvidia Corp. that have been the go-to chips for machine learning in recent years.

Inferentia is expected to be available next year. It was announced briefly by AWS Chief Executive Andy Jassy (pictured) during his keynote this morning at the company’s re:Invent conference in Las Vegas, but he provided few details on the design or specifications, though it will work with multiple data types and all the major frameworks such as PyTorch and TensorFlow and MXNet. It also will provide hundreds of tera operations per second or TOPS and can be used together to drive thousands of TOPS.

The chip is AWS’ second announced in as many days. On Monday evening, the company announced a processor called Graviton that’s available to its cloud customers through AWS’ EC2 cloud compute service. It’s based on the Arm architecture used in smartphones, network routers and a wide variety of other devices, and it’s gradually finding its way into computer servers such as the ones AWS designs for use in its massive data centers.

“AWS’s announcement that it will be developing its own ML inference chip that supports many frameworks is huge,” said Patrick Moorhead, president and principal analyst at Moor Insights & Strategy. “Unlike Google Cloud, the AWS service will be widely available and will be elastic. For inference, AWS now offers CPUs, GPUs, FPGAs and now its own ASIC.”

dr_3The past few years has seen a flurry of new chips optimized for certain applications, in particular machine learning and AI. Google LLC, for instance, offers cloud access to its custom Tensor Processing Unit chip. One reason for this resurgence of chip design is the need for so-called hyperscaler companies with large numbers of huge data centers to tweak every last bit of efficiency from their hardware.

All this has left data center leader Intel Corp. on the defensive, and it has bought a number of companies such as Altera Inc. and Movidius Inc. to add new chip designs and expertise to its core X86 lines. It has also tweaked X86 chips such as its current Xeon line to handle machine learning and other tasks better.

Amazon also announced Elastic Inference, which is a deep learning inference acceleration service powered by GPUs. Jassy said it can save up to 75 percent of the cost of doing inference by offering the option to provision only as much of AWS compute instances as are needed.

In addition, AWS debuted a number of AI-related services and products, including DeepRacer (above), an autonomous model car intended to be used by developers to learn about a branch of machine learning called reinforcement learning. It’s available for pre-order at $249.

Photo: Robert Hof/SiliconANGLE

Since you’re here …

Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!

Support our mission:    >>>>>>  SUBSCRIBE NOW >>>>>>  to our YouTube channel.

… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.