UPDATED 19:29 EDT / MARCH 18 2020

AI

Amazon Elastic Inference adds support for PyTorch machine learning models

Amazon Web Services Inc. announced today that it’s adding support for PyTorch models with its Amazon Elastic Inference service, which it said will help developers reduce the costs of deep learning inference by as much as 75% in some cases.

Amazon Elastic Inference is a service launched in late 2018 that enables customers to attach graphics processing unit-powered inference acceleration to a standard Amazon EC2 instance. Inference refers to the process of making predictions using a trained deep learning model.

PyTorch is an open-source machine learning library that was first developed by Facebook Inc. It’s used primarily for applications such as computer vision and natural language processing. In recent years it has grown in popularity within the machine learning community thanks to its use of dynamic computational graphs. They enable new deep learning models to be developed easily with imperative and idiomatic Python code.

Enhanced PyTorch libraries for EI are available by default in Amazon SageMaker, AWS Deep Learning AMIs and AWS Deep Learning Containers, allowing developers to deploy PyTorch models in production with minimal code changes.

In a blog post, Amazon explained that inference tends to account for around 90% of the compute costs for the average deep learning workload running on PyTorch. But selecting the right kind of instance for inference workloads is a tricky business, Amazon says, because each deep learning model has its own specific requirements regarding the optimum amount of GPUs, central processing units and memory resources.

“Optimizing for one of these resources on a standalone GPU instance usually leads to under-utilization of other resources,” the company said. “Therefore, you might pay for unused resources.”

Amazon Elastic Inference remedies this by allowing users to attach just the right amount of GPU-powered inference acceleration to both EC2, Amazon ECS and Amazon SageMaker instances.

“You can choose any CPU instance in AWS that is best suited to your application’s overall compute and memory needs, and separately attach the right amount of GPU-powered inference acceleration needed to satisfy your application’s latency requirements,” Amazon said. “This allows you to use resources more efficiently and lowers inference costs.”

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU