Nvidia’s TensorRT deep learning inference platform breaks new ground in conversational AI
Nvidia Corp. is upping its artificial intelligence game with the release of a new version of its TensorRT software platform for high-performance deep learning inference.
TensorRT is a platform that combines a high-performance deep learning inference optimizer with a runtime that delivers low-latency, high-throughput inference for AI applications.
Inference is an important aspect of AI. Whereas AI training relates to the development of an algorithm’s ability to understand a data set, inference refers to its ability to act on that data to infer answers to specific queries.
The latest version brings with it some dramatic improvements on the performance side. These include a significant reduction in inference times on one of the most advanced AI language models, called “Bidirectional Encoder Representations from Transformers -Large.” BERT-Large, as it’s known, is a method for natural language processing training. It involves training a general-purpose language understanding model on a large text corpus such as Wikipedia, and then using that model as a base for downstream NLP tasks, such as answering people’s questions.
Nvidia said TensorRT 6 comes with new optimizations that reduce algorithms’ inference times for BERT with T4 graphics processing units to just 5.8 milliseconds, down from the previous performance threshold of 10 milliseconds.
Nvidia said this improved performance is fast enough that BERT is now practical for enterprises to deploy in production for the first time. Conventional wisdom has it that NLP models need to be executed in less than 10 milliseconds to provide a natural and engaging experience.
The platform has also been optimized to accelerate inference on tasks relating to speech recognition, 3D image segmentation for medical applications, and image-based applications in industrial automation, Nvidia said.
TensorRT 6 also adds support for dynamic input batch sizes, which should help to speed up AI applications such as online services that have fluctuating compute needs, Nvidia said. The TensorRT Open Source Repository has also grown, with new training samples that should help to speed up inference with applications based on language and images.
Constellation Research Inc. analyst Holger Mueller said today’s improvements were timely as the race for conversational AI platforms is in full swing at the moment.
“But Nvidia still needs to address the on-premises deployment of next-generation applications, unless it manages to get the TensorRT platform into public clouds,” Mueller said. “Nvidia has a good track record with this, but it takes time to happen.”
Nvidia said the TensorRT 6 platform is available to download from today via its product page.
Image: Nvidia
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU