UPDATED 17:21 EST / JANUARY 08 2025

AI

Microsoft open-sources its Phi-4 small language model

Microsoft Corp. today released the code for Phi-4, a small language model that can generate text and solve math problems.

The company first detailed the model last month. Initially, Phi-4 was only accessible through Microsoft’s Azure Foundry artificial intelligence development service. The model is now downloadable on Hugging Face, a popular website for hosting open-source AI projects.

Phi-4 is the fourth iteration of a small language model series that Microsoft introduced in 2023. It features 14 billion parameters, the configuration settings that determine how a neural network goes about processing data. Microsoft researchers trained it on a cluster of 1,920 H100 graphics processing units from Nvidia Corp. over the course of 21 days.

The model is based on the industry-standard Transformer architecture that underpins most large language models. When they receive a user prompt, Transformer models break down the input into individual words and determine the meaning of each word by analyzing the surrounding text. Moreover, they prioritize the parts of the surrounding text that are deemed to be most relevant.

Phi-4 implements a so-called decoder-only variant of the Transformer architecture. A standard Transformer model analyzes the text before and after a word to determine its meaning. Decoder-only models focus solely on the text that precedes the word, which reduces the amount of data they have to process and thereby lowers inference costs. 

In a research paper, Microsoft detailed that it honed Phi-4’s output quality using two post-training optimization techniques. Those methods are known as direct preference optimization and supervised fine-tuning. Both involve supplying a language model with examples explaining how it should generate prompt responses.

In an internal evaluation, Microsoft compared Phi-4 against Llama 3.3 70B, an LLM with five times as many parameters. The company says that Phi-4 delivered better performance across the popular GPQA and MATH benchmarks. The two test datasets contain science questions and math problems, respectively. 

Phi-4 joins the growing list of small language models that have been open-sourced by major tech firms over the past year.

Last February, Google LLC introduced a series of small language models called Gemma. The algorithms in the series have between 2 billion and 27 billion parameters. According to Google, the version with 27 billion parameters can outperform models more than twice its size.

More recently, Meta Platforms Inc. released two Llama 3.2 models with under five billion parameters. The company followed up the release by open-sourcing even more efficient versions of those models that implement a machine learning called quantification. The technique compresses the data that a neural network ingests in order to reduce the amount of hardware necessary to process it.

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU