UPDATED 20:44 EDT / SEPTEMBER 16 2019

Nvidia’s TensorRT deep learning inference platform breaks new ground in conversational AI

Nvidia Corp. is upping its artificial intelligence game with the release of a new version of its TensorRT software platform for high-performance deep learning inference.

TensorRT is a platform that combines a high-performance deep learning inference optimizer with a runtime that delivers low-latency, high-throughput inference for AI applications.

Inference is an important aspect of AI. Whereas AI training relates to the development of an algorithm’s ability to understand a data set, inference refers to its ability to act on that data to infer answers to specific queries.

The latest version brings with it some dramatic improvements on the performance side. These include a significant reduction in inference times on one of the most advanced AI language models, called “Bidirectional Encoder Representations from Transformers -Large.” BERT-Large, as it’s known, is a method for natural language processing training. It involves training a general-purpose language understanding model on a large text corpus such as Wikipedia, and then using that model as a base for downstream NLP tasks, such as answering people’s questions.

Nvidia said TensorRT 6 comes with new optimizations that reduce algorithms’ inference times for BERT with T4 graphics processing units to just 5.8 milliseconds, down from the previous performance threshold of 10 milliseconds.

Nvidia said this improved performance is fast enough that BERT is now practical for enterprises to deploy in production for the first time. Conventional wisdom has it that NLP models need to be executed in less than 10 milliseconds to provide a natural and engaging experience.

The platform has also been optimized to accelerate inference on tasks relating to speech recognition, 3D image segmentation for medical applications, and image-based applications in industrial automation, Nvidia said.

TensorRT 6 also adds support for dynamic input batch sizes, which should help to speed up AI applications such as online services that have fluctuating compute needs, Nvidia said. The TensorRT Open Source Repository has also grown, with new training samples that should help to speed up inference with applications based on language and images.

Constellation Research Inc. analyst Holger Mueller said today’s improvements were timely as the race for conversational AI platforms is in full swing at the moment.

“But Nvidia still needs to address the on-premises deployment of next-generation applications, unless it manages to get the TensorRT platform into public clouds,” Mueller said. “Nvidia has a good track record with this, but it takes time to happen.”

Nvidia said the TensorRT 6 platform is available to download from today via its product page.

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Nvidia’s TensorRT deep learning inference platform breaks new ground in conversational AI

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Nvidia’s TensorRT deep learning inference platform breaks new ground in conversational AI

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026