UPDATED 19:18 EST / JUNE 17 2025

AI

Google updates Gemini 2.5 LLM series with new entry-level model, pricing changes

Google LLC today introduced a new large language model, Gemini 2.5 Flash-Lite, that can process prompts faster and more cost-efficiently than its predecessor.

The algorithm is rolling out as part of a broader update to the company’s flagship Gemini 2.5 LLM series. The two existing models in the lineup, Gemini 2.5 Flash and Gemini 2.5 Pro, have moved from preview to general availability. The latter algorithm also received several pricing changes.

Gemini 2.5 made its original debut in March. The LLMs in the series are based on a mixture-of-experts architecture, which means that they each comprise multiple neural networks. When a user submits a prompt, Gemini 2.5 activates only one of the neural networks rather than all of them, which lowers hardware usage.

The LLM series is the first that Google trained using its internally developed TPUv5p AI chip. According to the company, the training processing involved multiple server clusters that each contained 8,960 TPUv5p chips. Google’s researchers equipped the clusters with new software that can automatically mitigate some technical issues.

Gemini 2.5 models are multimodal with support for up to 1 million tokens per prompt. Google describes the flagship algorithm in the series, Gemini 2.5 Pro, as its most capable LLM to date. During internal tests, it outperformed OpenAI’s o3-mini across a range of math and coding benchmarks. 

Gemini 2.5 Flash, the model that moved into general availability today together with Gemini 2.5 Pro, trades off some performance for efficiency. It responds to prompts faster and incurs lower inference costs. Gemini 2.5 Flash-Lite, the new model that Google debuted today, is an even more efficient model that is positioned as the new entry-level model in the LLM series.

“2.5 Flash Lite has all-around higher quality than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks,” Tulsee Doshi, senior director of product management for Gemini, detailed in a blog post. “It excels at high-volume, latency-sensitive tasks like translation and classification, with lower latency than 2.0 Flash-Lite and 2.0 Flash on a broad sample of prompts.”

Gemini 2.5 Flash-Lite is billed at a rate of 10 cents per 1 million input tokens when developers submit prompts that contain text, images or video. That’s less than one-10th the cost of Gemini 2.5 Pro. The price per million tokens of output, in turn, is 40 cents compared with $10 for Gemini 2.5 Pro.

Google is changing the pricing of its mid-range Gemini 2.5 Flash model as part of the update. The company will now charge 30 cents per million input tokens and $2.50per 1 million output tokens compared with 15 cents and $3.50, respectively, before. Additionally, there is no longer separate pricing for tokens that the model processes in “thinking mode.” The mode allows the LLM to boost output quality by increasing the amount of time and compute resources that it uses to generate prompt responses.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.