UPDATED 18:44 EST / JULY 04 2024

AI

Meta open-sources new ‘multi-token prediction’ language models

Meta Platforms Inc. has open-sourced four language models that implement an emerging machine learning approach known as multi-token prediction.

VentureBeat reported the release of the models today. Meta made their code available on HuggingFace, a popular platform for hosting artificial intelligence projects.

Large language models generate the text or code they output one token at a time. A token is a unit of data that corresponds to a few letters. Meta’s new open-source models, in contrast, generate four tokens at a time. They do so using a processing technique known as multi-token prediction that the company believes can make LLMs both faster and more accurate.

Meta’s four new models are geared toward code generation tasks and feature 7 billion parameters each. Two were trained on 200 billion tokens’ worth of code samples while the other pair received 1 trillion tokens apiece. In a paper accompanying the models, Meta detailed that it also developed a yet-unreleased fifth LLM with 13 billion parameters.

Under the hood, each of the models comprises two main components. The first is a so-called shared trunk that performs the initial computations involved in generating a code snippet. According to Meta, the subsequent steps of the code generation workflow are carried out by a set of so-called output heads. There are four output heads that each generate one token at a time, which is what enables Meta’s models to produce four tokens at once.

It’s currently unclear why that approach produces higher-quality code than traditional LLM designs. In their paper, Meta’s researchers argue that the reason may have to do with the way language models are built.

Developers commonly train LLMs using a technique known as teacher-forcing. The method involves assigning a model a task, such as generating a piece of code, and then providing it with the correct answer if it makes a mistake. This approach helps streamline the development workflow but can limit the accuracy of the LLM being trained. 

According to Meta’s researchers, it’s possible that generating output four tokens at a time mitigates the limitations of the teacher-forcing approach. “Teacher-forcing, we argue, encourages models to focus on predicting well in the very short term, at the potential expense of ignoring longer-term dependencies in the overall structure of the generated sequence,” the researchers explained.

Meta tested the accuracy of its multi-token prediction models using the MBPP and HumanEval benchmark tests. MBPP contains about 1,000 Python coding tasks. HumanEval, in turn, provides a more complex set of coding tasks spanning multiple programming languages.

Meta says that its models performed 17% and 12% better on MPP and HumanEval, respectively, than comparable LLMs that generate tokens one at a time. Moreover, the models generated output three times faster. 

Photo: Wikimedia Commons

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU