UPDATED 13:06 EDT / NOVEMBER 28 2018

CLOUD

Amazon debuts Inferentia, a custom machine learning prediction chip

SPECIAL REPORT: THE CLOUD COMES OF AGE by Robert Hof

In another sign of Amazon.com Inc.’s broad ambitions in cloud computing, the company’s cloud company today debuted a new processor chip designed for machine learning.

The chip, called Inferentia, will be available via Amazon Web Service Inc.’s EC2 computing service as well as its SageMaker AI service and Amazon Elastic Inference, a new service also announced today. It’s designed to speed the process of inference, or predictions, carried out by machine learning models, helping power services such as Amazon’s Alexa and self-driving cars.

Designed by Annapurna Labs, the chip design firm Amazon bought a few years ago, it’s claimed to be low-latency and cost-effective compared with graphics processing units, the chips mostly from Nvidia Corp. that have been the go-to chips for machine learning in recent years.

Inferentia is expected to be available next year. It was announced briefly by AWS Chief Executive Andy Jassy (pictured) during his keynote this morning at the company’s re:Invent conference in Las Vegas, but he provided few details on the design or specifications, though it will work with multiple data types and all the major frameworks such as PyTorch and TensorFlow and MXNet. It also will provide hundreds of tera operations per second or TOPS and can be used together to drive thousands of TOPS.

The chip is AWS’ second announced in as many days. On Monday evening, the company announced a processor called Graviton that’s available to its cloud customers through AWS’ EC2 cloud compute service. It’s based on the Arm architecture used in smartphones, network routers and a wide variety of other devices, and it’s gradually finding its way into computer servers such as the ones AWS designs for use in its massive data centers.

“AWS’s announcement that it will be developing its own ML inference chip that supports many frameworks is huge,” said Patrick Moorhead, president and principal analyst at Moor Insights & Strategy. “Unlike Google Cloud, the AWS service will be widely available and will be elastic. For inference, AWS now offers CPUs, GPUs, FPGAs and now its own ASIC.”

dr_3 The past few years has seen a flurry of new chips optimized for certain applications, in particular machine learning and AI. Google LLC, for instance, offers cloud access to its custom Tensor Processing Unit chip. One reason for this resurgence of chip design is the need for so-called hyperscaler companies with large numbers of huge data centers to tweak every last bit of efficiency from their hardware.

All this has left data center leader Intel Corp. on the defensive, and it has bought a number of companies such as Altera Inc. and Movidius Inc. to add new chip designs and expertise to its core X86 lines. It has also tweaked X86 chips such as its current Xeon line to handle machine learning and other tasks better.

Amazon also announced Elastic Inference, which is a deep learning inference acceleration service powered by GPUs. Jassy said it can save up to 75 percent of the cost of doing inference by offering the option to provision only as much of AWS compute instances as are needed.

In addition, AWS debuted a number of AI-related services and products, including DeepRacer (above), an autonomous model car intended to be used by developers to learn about a branch of machine learning called reinforcement learning. It’s available for pre-order at $249.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Amazon debuts Inferentia, a custom machine learning prediction chip

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Amazon debuts Inferentia, a custom machine learning prediction chip

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026