UPDATED 18:44 EDT / JULY 04 2024

AI

Meta open-sources new ‘multi-token prediction’ language models

Meta Platforms Inc. has open-sourced four language models that implement an emerging machine learning approach known as multi-token prediction.

VentureBeat reported the release of the models today. Meta made their code available on HuggingFace, a popular platform for hosting artificial intelligence projects.

Large language models generate the text or code they output one token at a time. A token is a unit of data that corresponds to a few letters. Meta’s new open-source models, in contrast, generate four tokens at a time. They do so using a processing technique known as multi-token prediction that the company believes can make LLMs both faster and more accurate.

Meta’s four new models are geared toward code generation tasks and feature 7 billion parameters each. Two were trained on 200 billion tokens’ worth of code samples while the other pair received 1 trillion tokens apiece. In a paper accompanying the models, Meta detailed that it also developed a yet-unreleased fifth LLM with 13 billion parameters.

Under the hood, each of the models comprises two main components. The first is a so-called shared trunk that performs the initial computations involved in generating a code snippet. According to Meta, the subsequent steps of the code generation workflow are carried out by a set of so-called output heads. There are four output heads that each generate one token at a time, which is what enables Meta’s models to produce four tokens at once.

It’s currently unclear why that approach produces higher-quality code than traditional LLM designs. In their paper, Meta’s researchers argue that the reason may have to do with the way language models are built.

Developers commonly train LLMs using a technique known as teacher-forcing. The method involves assigning a model a task, such as generating a piece of code, and then providing it with the correct answer if it makes a mistake. This approach helps streamline the development workflow but can limit the accuracy of the LLM being trained. 

According to Meta’s researchers, it’s possible that generating output four tokens at a time mitigates the limitations of the teacher-forcing approach. “Teacher-forcing, we argue, encourages models to focus on predicting well in the very short term, at the potential expense of ignoring longer-term dependencies in the overall structure of the generated sequence,” the researchers explained.

Meta tested the accuracy of its multi-token prediction models using the MBPP and HumanEval benchmark tests. MBPP contains about 1,000 Python coding tasks. HumanEval, in turn, provides a more complex set of coding tasks spanning multiple programming languages.

Meta says that its models performed 17% and 12% better on MPP and HumanEval, respectively, than comparable LLMs that generate tokens one at a time. Moreover, the models generated output three times faster. 

Photo: Wikimedia Commons

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.