UPDATED 15:15 EST / JULY 28 2021

AI

OpenAI debuts new AI programming language for creating neural networks

Prominent artificial intelligence research lab OpenAI LLC today released Triton, a specialized programming language that it says will enable developers to create high-speed machine learning algorithms more easily.

The first version of Triton was presented two years ago in an academic paper by OpenAI scientist Philippe Tillet. As part of today’s launch, OpenAI released a significantly upgraded edition dubbed Triton 1.0 with optimizations that lend themselves to enterprise machine learning projects.

The vast majority of enterprise AI models run on Nvidia Corp. graphics processing units. Developers use software supplied by Nvidia to build those models. One of the most important of Nvidia’s is the CUDA framework, which provides the foundational software building blocks that AI applications use to carry out computations with GPUs.

The issue OpenAI is tackling with Triton is that the CUDA framework is considered quite challenging to use. In particular, the main challenge is maximizing an AI model’s performance so that it will process data as fast as possible. For developer teams using CUDA, maximizing AI performance requires making complicated and fine-grained optimizations to their code that are considered difficult to implement even with years of experience. 

Enter OpenAI’s Triton programming language. According to the lab, the language performs many AI code optimizations automatically to save time for developers. 

OpenAI is promising two main benefits for software teams. The first is that Triton can speed up AI projects, since developers have to spend up less time optimizing their code. The other, according to OpenAI, is that Triton’s relative simplicity can enable software teams without extensive CUDA programming experience to create more efficient algorithms than they otherwise could. 

“Triton makes it possible to reach peak hardware performance with relatively little efforts,” OpenAI’s Tillet explained in a blog post today. “For example, it can be used to write FP16 matrix multiplication kernels that match the performance of cuBLAS — something that many GPU programmers can’t do — in under 25 lines of code.” Matrix multiplication kernels are a software mechanism that machine learning algorithms rely on heavily to perform calculations.

Triton improves AI performance by optimizing three core steps of the workflow with which a machine learning algorithm running on an Nvidia chip processes data.

The first step is the task of moving data between a GPU’s DRAM and SRAM memory circuits. GPUs store information in DRAM when it’s not actively used and transfer it to the SRAM memory to carry out computations. The faster data can be transferred between the two components, the faster machine learning algorithms run, which is why developers prioritize optimizing this aspect of the computing workflow as part of AI projects. 

The optimization process consists of merging the blocks of data moving from DRAM to SRAM into large units of information. Triton performs the task automatically, OpenAI says, thereby saving time for developers.

The second computational step Triton optimizes is the task of distributing the incoming data blocks across the GPU’s SRAM circuits in a way that makes it possible to analyze them as fast as possible. 

One of the main challenges involved in this step is avoiding so-called memory bank conflicts. That’s the term for a situation where two pieces of software accidentally try to write data to the same memory segment. Memory bank conflicts hold up calculations until they’re resolved, which means that by reducing how often such errors occur, developers can speed up the performance of their AI algorithms.

“Data must be manually stashed to SRAM prior to being re-used, and managed so as to minimize shared memory bank conflicts upon retrieval,” Tillet explained. 

The third and final task Triton helps automate involves not GPUs’ memory cells but rather their CUDA cores, the computing circuits responsible for carrying out calculations on data stored in memory. A single Nvidia data center GPU has thousands of such circuits. They allow the chip to perform a large number of calculations at the same time.

To maximize the performance of an AI model, developers must configure it to spread out calculations across multiple CUDA cores so they can be done at the same time rather than one after another. Triton automates this chore as well, though only partly. The reason it doesn’t automate the entire workflow is because OpenAI sought to give developers the flexibility to manually customize the process for their projects as needed.

Triton is available on GitHub

Photo: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU