UPDATED 17:06 EDT / FEBRUARY 26 2025

Microsoft releases new Phi models optimized for multimodal processing, efficiency

Microsoft Corp. today expanded its Phi line of open-source language models with two new algorithms optimized for multimodal processing and hardware efficiency.

The first addition is the text-only Phi-4-mini. The second new model, Phi-4-multimodal, is an upgraded version of Phi-4-mini that can also process visual and audio input. Microsoft says that both models significantly outperform comparably sized alternatives at certain tasks.

Phi-4-mini, the text-only model, features 3.8 billion parameters. That makes it compact enough to run on mobile devices. It’s based on the ubiquitous transformer neural network architecture that underpins most LLMs.

A standard transformer model analyzes the text before and after a word to understand its meaning. According to Microsoft, Phi-4-mini is based on a version of the architecture called a decoder-only transformer that takes a different approach. Such models only analyze the text that precedes a word when trying to determine its meaning, which lowers hardware usage and speeds up processing speed.

Phi-4-mini also uses a second performance optimization technique called grouped query attention, or GQA. It reduces the hardware usage of the algorithm’s attention mechanism. A language model’s attention mechanism helps it determine which data points are most relevant to a given processing task.

Phi-4-mini can generate text, translate existing documents and take actions in external applications. According to Microsoft, it’s particularly adept at math and coding tasks that require “complex reasoning.” In a series of internal benchmark tests, the company determined that Phi-4-mini can complete such tasks with “significantly” better accuracy than several similarly-sized language models.

The second new model that Microsoft released today, Phi-4-multimodal, is an upgraded version of Phi-4-mini with 5.6 billion parameters. It can process not only text but also images, audio and video. Microsoft trained the model using a new technique it dubs Mixture of LoRAs.

Adapting an AI to a new task usually requires changing its weights, the configuration settings that determine how it crunches data. This process can be costly and time-consuming. As a result, researchers often use a different approach known as LoRA. Instead of modifying existing weights, LoRA teaches a model to perform an unfamiliar task by adding a small number of new weights optimized for that task.

Microsoft’s Mixture of LoRA method applies the same concept to multimodal processing. To create Phi-4-multimodal, the company extended Phi-4-mini with weights optimized to process audio and visual data. According to Microsoft, the technique mitigates some of the tradeoffs associated with other approaches to building multimodal models.

The company tested Phi-4-multimodal’s capabilities using more than a half-dozen visual data processing benchmarks. The model achieved an average score of 72, trailing OpenAI’s GPT-4 by less than one point. Google LLC’s Gemini Flash 2.0, a cutting-edge large language model that debuted in December, scored 74.3.

Phi-4-multimodal achieved even better performance in a set of benchmark tests that involved both visual and audio input. According to Microsoft, the model outperformed Gemini-2.0 Flash “by a large margin.” Phi-4-multimodal also bested InternOmni, an open-source LLM that is built specifically to process multimodal data and has a higher parameter count.

Microsoft will make Phi-4-multimodal and Phi-4-mini available on Hugging Face under an MIT license, which permits commercial use.

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Microsoft releases new Phi models optimized for multimodal processing, efficiency

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

RECENT CUBE EVENTS

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Microsoft releases new Phi models optimized for multimodal processing, efficiency

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Cookies