UPDATED 12:00 EST / MAY 19 2025

Microsoft debuts Windows AI Foundry for local model development on AI PCs

Microsoft Corp. said today it’s advancing the local artificial intelligence development capabilities of Windows, as part of an effort to help developers build and experiment and reach new users with sophisticated AI experiences.

At Microsoft Build 2025 running this week in Seattle, the company said it has evolved Windows Copilot Runtime into a new service called Windows AI Foundry, offering numerous powerful new features to enable AI integrations in applications.

The service integrates large language models from Foundry Local and other model catalogs, such as Ollama and Nvidia NIMs, providing developers with easy access to a range of ready-to-use open-source models. They’re optimized across multiple types of hardware so they can be instantly deployed, the company said.

According to Microsoft, Foundry Local will instantly detect the hardware developers are using — be it a central processing unit, graphics processing unit or neural processing unit, and list all of the models that are compatible with that chipset. Then, they can use the Foundry Local SDK to integrate Foundry Local directly within their applications.

In addition, Windows AI Foundry will also support the large number of developers who are building their own LLMs. It said Windows ML will act as the built-in AI inference runtime to streamline model deployment across every CPU, GPU and NPU.

Windows ML is a high-performance local runtime that’s built directly into the Windows operating system, in order to simplify the task of shipping production applications or proprietary models, including Microsoft’s own Copilot+ PC experiences. It represents an evolution of the DirectML runtime, and incorporates feedback from silicon partners including Intel Corp., Advanced Micro Devices Inc., Nvidia Corp. and Qualcomm Inc.

Microsoft’s corporate vice president of Windows + Devices Pavan Davuluri said in a blog post that Windows ML provides a number of benefits to developers, beginning with simplified application deployment. He said developers will be able to ship production applications without needing to package ML runtimes, drivers or hardware execution partners with their apps. Instead, Windows ML just detects the hardware on the client’s devices and chooses the most appropriate execution provider, based on the app’s configuration.

Davuluri added that Windows ML can also adapt automatically to new AI hardware. So as new CPUs and GPUs become available, it will keep all of the required dependencies up to date and adapt to the new silicon, ensuring model accuracy while maintaining full compatibility with the underlying hardware.

Finally, Windows ML will come with a range of tools included within the AI Toolkit for VS Code to simplify tasks such as model conversion, model quantization and model optimization, all in one place.

Davuluri stressed that Microsoft has worked very closely with its silicon partners to integrate their third-party execution providers seamlessly with Windows ML, providing the best model performance for whatever hardware is available on the developer’s machine.

New APIs and enhanced model fine-tuning

To simplify key AI tasks such as text intelligence and image processing within applications, Microsoft has created a number of ready-to-use application programming interfaces. These include text summarization, rewrite and vision APIs such as text recognition, image description and image super resolution, available now in the Windows App SDK 1.7.2.

According to Davuluri, the new APIs are meant to eliminate the overheads associated with model building and development. The APIs run locally on developer’s devices, ensuring compliance, privacy and security, and they’re fully optimized for NPUs on Copilot+.

In addition, Microsoft is looking to cater to developers who need to fine-tune open-source LLMs with custom data with the launch of LoRA support for Phi Silica.

LoRA (low-rank-adoption) for Phi Silica is in public preview now on Snapdragon X Series NPUs. It’s aimed at making model fine-tuning more efficient. It does that because it only updates a small subset of parameters of each model using the developer’s customized data. That can help to increase the performance of models on a specific task without affecting its broader capabilities, Davuluri said.

To get started, developers will be able to access LoRA training training for Phi Silica in the AI Toolkit for VS Code. It can be found in the Fine Tuning Tool menu. Developers select the Phi Silica model, configure the project and immediately kick off the training on Azure using the custom data set. Once that’s complete, developers can download the LoRA adapter and use this atop of the Phi Silica API and start experimenting to see how different its responses are.

Elsewhere, Microsoft announced a set of new Semantic Search APIs for developers to create more powerful search experiences in their applications leveraging their own data. The APIs enable both semantic and lexical search, allowing users to search by both meaning and their exact words, making it easier for them to find exactly what they need.

The search APIs will run locally on all device types, Davuluri said. Beyond traditional search, they also support retrieval-augmented generation, giving developers a simple way to ground their model’s outputs in their own custom data. They’re available now in private preview on all Copilot+ PCs, Microsoft said.

Azure AI Foundry gets simpler model selection and AI agents

For developers who prefer to build their applications directly in the cloud, the Azure AI Foundry is getting plenty of updates too. Azure AI Foundry is a comprehensive platform for designing, building, customizing and managing AI applications and agents. It provides everything from models and agents to development tools and observability within a single portal.

With the latest version of Azure AI Foundry, Microsoft said it’s streamlining model selection, customization and monitoring. In addition, the Foundry Agent Service has become generally available, paving the way for teams to customize, deploy and run multiagent apps at scale.

To make model choice easier, Microsoft has created a new model leaderboard that ranks hundreds of LLMs based on their quality, cost and throughput. It has introduced a smart model router that can automatically select the most appropriate model for each request, based on the developers latency and budget constraints. According to Microsoft’s high-performance computing engineer Yina Arenas, early adapters saw cost savings of up to 60% by using it to optimize model selection.

The platform’s fine-tuning tools have been upgraded too, and developers can now fine-tune models with reinforcement techniques such as GPT4.1nano, o4mini and Llama 4 in Foundry Models. There’s also a new, low-cost Developer Tier available, which eliminates hosting fees during experimentation.

As for Foundry Agent Service, this is now generally available, making it simple for developers to host a single AI agent or groups of them, and expose them using the agent-to-agent protocol.

Arenas said the service is meant to simplify AI agent development and deployment, integrating with data sources such as Microsoft Bing, SharePoint, Azure AI Search and Microsoft Fabric, while supporting task automation via tools like Azure Logic Apps and Azure Functions, plus third-party tools that use the open-source Model Context Protocol.

Other new capabilities include a new software development kit that merges the old Semantic Kernel and AutoGen SDKs, creating a single, composable API for defining and deploying AI agents with identical behavior, locally or in the cloud.

Image: SiliconANGLE/Dreamina

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Microsoft debuts Windows AI Foundry for local model development on AI PCs

New APIs and enhanced model fine-tuning

Azure AI Foundry gets simpler model selection and AI agents

Image: SiliconANGLE/Dreamina

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Celosphere 2025

Dell AI Data Platform Event 2025

Nvidia GTC Washington, D.C. 2025

The AI Security Summit 2025

Audit & Beyond 2025

Microsoft debuts Windows AI Foundry for local model development on AI PCs

New APIs and enhanced model fine-tuning

Azure AI Foundry gets simpler model selection and AI agents

Image: SiliconANGLE/Dreamina

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Celosphere 2025

Dell AI Data Platform Event 2025

Nvidia GTC Washington, D.C. 2025

The AI Security Summit 2025

Audit & Beyond 2025

Cookies