UPDATED 17:06 EST / DECEMBER 06 2024

Meta releases efficiency-optimized Llama 3.3 70B large language model

Meta Platforms Inc. today introduced Llama 3.3 70B, the latest addition to its eponymous line of open-source large language models.

The new algorithm provides similar output quality as Llama 3.1 405B, the most advanced LLM in the series, but using a fraction of the hardware. The result is a significant drop in infrastructure expenses. Meta says that Llama 3.3 70B generates prompt responses nearly five times more cost-efficiently.

The model is based on an optimized version of the Transformer architecture, the neural network design that underpins most cutting-edge LLMs. When analyzing a set of data points, Transformer-based models use a so-called attention mechanism to determine which data points are most relevant to the task at hand. Meta swapped the default attention mechanism with an improved implementation that lowers inference costs.

The company’s engineers trained Llama 3.3 70B on a cluster of H100-80GB chips from Nvidia Corp. The chips’ TDP, a metric that tracks the extent to which a processor’s compute capacity is utilized, was set to the 700-watt maximum. Meta that the LLM took 39.3 million graphics card-hours to train.

The training dataset includes about 15 trillion tokens, units of data that each correspond to a few letters or numbers. Meta used information from the public web, as well as more than 25 million synthetic examples. Those are AI-generated data points created specifically for LLM development purposes.

After Meta completed the initial training process, it refined Llama 3.3 70B with several methods.

One of the techniques the company used is known as supervised fine-tuning. It involves providing a freshly developed LLM with additional datasets that it didn’t access during the initial training. Those additional datasets contain metadata, or contextual information, that makes it easier for the LLM to find useful patterns.

Meta also used another AI method known as RLHF. While an LLM is being trained, it receives pointers from an algorithm on how to improve the quality of its output. RLHF combines those automatically-generated pointers with feedback from humans.

After completing the development process, Meta compared Llama 3.3 70B with Llama 3.1 405B using 10 AI benchmarks. Llama 3.3 70B trailed its larger namesake by under 2% in six of the tests and managed to achieve higher scores across three. It also mostly outperformed OpenAI’s GPT-4o.

According to Meta, processing 1 million input tokens with Llama 3.1 405B costs $1, while generating 1 million output tokens requires $1.80’s worth of compute capacity. Llama can manage the same tasks with 10 cents’ and 40 cents’ worth of infrastructure, respectively.

Meta has made the source code for Llama 3.3 70B available on Hugging Face.

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Meta releases efficiency-optimized Llama 3.3 70B large language model

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Resilience for Everything: Cloud, Identity, AI

RECENT CUBE EVENTS

Microsoft Ignite 2025

SC25

Refresh North America 2025

QAD Champions of Manufacturing 2025

Agentic AI Unleashed: The Future of Digital & IT Operations 2025

Meta releases efficiency-optimized Llama 3.3 70B large language model

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Resilience for Everything: Cloud, Identity, AI

Microsoft Ignite 2025

SC25

Refresh North America 2025

QAD Champions of Manufacturing 2025

Agentic AI Unleashed: The Future of Digital & IT Operations 2025

Cookies