UPDATED 15:00 EDT / APRIL 17 2025

Google launches Gemini 2.5 Flash in preview for developers

Google LLC today rolled out Gemini 2.5 Flash in preview through its developer platforms so that artificial intelligence engineers and users can get a head start with the AI model.

Gemini 2.5 Flash builds on the foundation of 2.0 Flash, the company’s existing low-latency, high-performance model designed to power AI agents. The Google said the new model has enhanced reasoning capabilities and is a “thinking” model, meaning it can break down complex tasks into step-by-step plans before responding.

The new model is available starting today via the Gemini application programming interface on Google AI Studio and on Vertex AI, Google Cloud’s fully managed machine learning platform for building, training and deploying AI models.

“Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off,” Google said in the announcement. “The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost and latency.”

Google stressed that the company is aware that the thinking capability consumes tokens, the units used for processing information, which can increase time and cost. To give developers flexibility in how the model operates, Google gives developers the ability to cap the maximum number of tokens the model will spend thinking. A higher budget will improve quality, but slow it down; a smaller budget will cause it to move faster.

The model is also trained to automatically set a budget based on the complexity of a given prompt. For example, a simple question such as “How do you say, ‘Thank you” in Spanish,” or “How many provinces does Canada have?” don’t require much reasoning as they probably exist in the model’s general training or can be discovered in one step after an internet search.

Medium-level reasoning might involve tasks such as asking the model to build a daily schedule for a user based on a set of calendar events or determining the probability of a pair of dice. High-level reasoning would be asking the AI to code an entire function in Python that computes complex math. Some users have asked Gemini to help them code entire web games before, to mixed results.

Google said setting the thinking budget to 0 will result in the lowest cost and latency.

Input tokens for Gemini 2.5 Flash cost 15 cents per million input tokens and 60 cents per million output tokens without reasoning. With thinking active, the cost goes up to $3.50 per million tokens.

According to Google, 2.5 Flash has proven to be a significant upgrade over 2.0 Flash, especially in its reasoning capability. With reasoning active, its ability to break down complex tasks that require multiple steps, such as solving mathematical problems and research questions, has been greatly enhanced.

Gemini 2.5 Flash scored 12.1% on Humanity’s Last Exam compared with 2.0 Flash at 5.1%. This benchmark is designed to test AI systems using the most challenging questions humans can create in fields such as mathematics, humanities and natural sciences.

Google said 2.5 Flash continues to be the lead model with the best price-per-performance in the market. It also performs strongly on Hard Prompts in LMArena, the chatbot evaluation leaderboard, second only to Gemini 2.5 Pro, released last month.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google launches Gemini 2.5 Flash in preview for developers

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Google launches Gemini 2.5 Flash in preview for developers

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies