UPDATED 13:06 EDT / NOVEMBER 28 2018

CLOUD

Amazon debuts Inferentia, a custom machine learning prediction chip

SPECIAL REPORT: THE CLOUD COMES OF AGE by Robert Hof

In another sign of Amazon.com Inc.’s broad ambitions in cloud computing, the company’s cloud company today debuted a new processor chip designed for machine learning.

The chip, called Inferentia, will be available via Amazon Web Service Inc.’s EC2 computing service as well as its SageMaker AI service and Amazon Elastic Inference, a new service also announced today. It’s designed to speed the process of inference, or predictions, carried out by machine learning models, helping power services such as Amazon’s Alexa and self-driving cars.

Designed by Annapurna Labs, the chip design firm Amazon bought a few years ago, it’s claimed to be low-latency and cost-effective compared with graphics processing units, the chips mostly from Nvidia Corp. that have been the go-to chips for machine learning in recent years.

Inferentia is expected to be available next year. It was announced briefly by AWS Chief Executive Andy Jassy (pictured) during his keynote this morning at the company’s re:Invent conference in Las Vegas, but he provided few details on the design or specifications, though it will work with multiple data types and all the major frameworks such as PyTorch and TensorFlow and MXNet. It also will provide hundreds of tera operations per second or TOPS and can be used together to drive thousands of TOPS.

The chip is AWS’ second announced in as many days. On Monday evening, the company announced a processor called Graviton that’s available to its cloud customers through AWS’ EC2 cloud compute service. It’s based on the Arm architecture used in smartphones, network routers and a wide variety of other devices, and it’s gradually finding its way into computer servers such as the ones AWS designs for use in its massive data centers.

“AWS’s announcement that it will be developing its own ML inference chip that supports many frameworks is huge,” said Patrick Moorhead, president and principal analyst at Moor Insights & Strategy. “Unlike Google Cloud, the AWS service will be widely available and will be elastic. For inference, AWS now offers CPUs, GPUs, FPGAs and now its own ASIC.”

dr_3 The past few years has seen a flurry of new chips optimized for certain applications, in particular machine learning and AI. Google LLC, for instance, offers cloud access to its custom Tensor Processing Unit chip. One reason for this resurgence of chip design is the need for so-called hyperscaler companies with large numbers of huge data centers to tweak every last bit of efficiency from their hardware.

All this has left data center leader Intel Corp. on the defensive, and it has bought a number of companies such as Altera Inc. and Movidius Inc. to add new chip designs and expertise to its core X86 lines. It has also tweaked X86 chips such as its current Xeon line to handle machine learning and other tasks better.

Amazon also announced Elastic Inference, which is a deep learning inference acceleration service powered by GPUs. Jassy said it can save up to 75 percent of the cost of doing inference by offering the option to provision only as much of AWS compute instances as are needed.

In addition, AWS debuted a number of AI-related services and products, including DeepRacer (above), an autonomous model car intended to be used by developers to learn about a branch of machine learning called reinforcement learning. It’s available for pre-order at $249.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Amazon debuts Inferentia, a custom machine learning prediction chip

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Data Protection & AI Summit

AWS & Ecosystem Leaders Halftime Report - 2025

Black Hat USA 2025

VMware Explore 2025

CrowdStrike Fal.Con 2025

RECENT CUBE EVENTS

Open Source Summit NA 2025

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

Amazon debuts Inferentia, a custom machine learning prediction chip

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Data Protection & AI Summit

AWS & Ecosystem Leaders Halftime Report - 2025

Black Hat USA 2025

VMware Explore 2025

CrowdStrike Fal.Con 2025

Open Source Summit NA 2025

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

Cookies