UPDATED 15:00 EDT / APRIL 17 2025

AI

Google launches Gemini 2.5 Flash in preview for developers

Google LLC today rolled out Gemini 2.5 Flash in preview through its developer platforms so that artificial intelligence engineers and users can get a head start with the AI model.

Gemini 2.5 Flash builds on the foundation of 2.0 Flash, the company’s existing low-latency, high-performance model designed to power AI agents. The Google said the new model has enhanced reasoning capabilities and is a “thinking” model, meaning it can break down complex tasks into step-by-step plans before responding.

The new model is available starting today via the Gemini application programming interface on Google AI Studio and on Vertex AI, Google Cloud’s fully managed machine learning platform for building, training and deploying AI models.

“Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off,” Google said in the announcement. “The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost and latency.”

Google stressed that the company is aware that the thinking capability consumes tokens, the units used for processing information, which can increase time and cost. To give developers flexibility in how the model operates, Google gives developers the ability to cap the maximum number of tokens the model will spend thinking. A higher budget will improve quality, but slow it down; a smaller budget will cause it to move faster.

The model is also trained to automatically set a budget based on the complexity of a given prompt. For example, a simple question such as “How do you say, ‘Thank you” in Spanish,” or “How many provinces does Canada have?” don’t require much reasoning as they probably exist in the model’s general training or can be discovered in one step after an internet search.

Medium-level reasoning might involve tasks such as asking the model to build a daily schedule for a user based on a set of calendar events or determining the probability of a pair of dice. High-level reasoning would be asking the AI to code an entire function in Python that computes complex math. Some users have asked Gemini to help them code entire web games before, to mixed results.

Google said setting the thinking budget to 0 will result in the lowest cost and latency.

Input tokens for Gemini 2.5 Flash cost 15 cents per million input tokens and 60 cents per million output tokens without reasoning. With thinking active, the cost goes up to $3.50 per million tokens.

According to Google, 2.5 Flash has proven to be a significant upgrade over 2.0 Flash, especially in its reasoning capability. With reasoning active, its ability to break down complex tasks that require multiple steps, such as solving mathematical problems and research questions, has been greatly enhanced.

Gemini 2.5 Flash scored 12.1% on Humanity’s Last Exam compared with 2.0 Flash at 5.1%. This benchmark is designed to test AI systems using the most challenging questions humans can create in fields such as mathematics, humanities and natural sciences.

Google said 2.5 Flash continues to be the lead model with the best price-per-performance in the market. It also performs strongly on Hard Prompts in LMArena, the chatbot evaluation leaderboard, second only to Gemini 2.5 Pro, released last month.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.