

Google LLC said today it’s updating its flagship Gemini artificial intelligence model family by introducing an experimental Gemini 2.5 Pro version.
The company added it is the “most intelligent” yet and will include “thinking” capabilities built-in. All upcoming Gemini 2.5 models will be thinking models, capable of breaking down tasks into multiple steps and reasoning through them before responding. The company said this will result in enhanced performance and improved accuracy.
“In the field of AI, a system’s capacity for ‘reasoning’ refers to more than just classification and prediction,” Koray Kavukcuoglu, chief technology officer of Google DeepMind, the company’s research arm, explained in the announcement. “It refers to its ability to analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions.”
This thinking capability was first introduced by Google in its Gemini 2.0 Flash Thinking Experimental AI model, which was released in December. To create the model, the company explored AI-building practices including reinforcement learning and chain-of-thought prompting.
In the case of Gemini 2.0 Flash Thinking, users can activate the thinking capability by clicking a button when prompting the model and it would then “think” through tasks. It also shows its reasoning, allowing the user to see the process and the chain of thought that it took to reach its conclusion.
Google is no longer adding the “Thinking” label to its models.
The company said with the new reasoning capability, Gemini 2.5 Pro Experimental has achieved a new level of performance above the base model due to post-training. It is the most advanced model for complex tasks and topped the LMArena leaderboard – which measures human preferences – by a significant margin.
It also led with an 18.8% in Humanity’s Last Exam, a dataset designed by hundreds of subject matter experts about human knowledge and reasoning, compared with 14% for OpenAI’s o3-mini and 8.6% DeepSeek R1. For context, o3-mini and R1 are both thinking models capable of complex reasoning in the same manner that Google has designed Gemini 2.5 Pro Experimental.
“We’ve been focused on coding performance, and with Gemini 2.5 we’ve achieved a big leap over 2.0 — with more improvements to come,” said Kavukcuoglu.
To demonstrate the model’s new capabilities, Google researchers prompted it to generate an endless-runner-style dinosaur video game using HTML, CSS and JavaScript using a single prompt and it successfully did so in one pass.
The experimental Gemini 2.5 Pro model comes with a context window of 1 million tokens, which allows it to ingest extremely large documents, audio and videos, which is around 1.5 million words. Google said it intends to expand the window to 2 million.
With its large context window and high performance, Gemini 2.5 Pro provides a powerful foundation for AI agents. This enables them to process vast datasets and tackle complex problems more effectively. Because AI agents operate and plan autonomously, the model’s enhanced reasoning capability will significantly improve their ability to understand data and utilize tools to complete tasks.
Developers and enterprise users can start experimenting with Gemini 2.5 Pro in Google AI Studio now, and Gemini Advanced users can select it immediately from the dropdown on desktop and mobile. Users of Vertex AI, Google’s managed machine learning platform for building and deploying AI, will be able to experiment with the new model in the coming weeks.
In addition to the experimental Gemini 2.5 Pro, Google also announced TxGemma, a collection of open AI models designed to improve the efficiency of drug and therapy development using large language models.
The new models build on Gemma, Google DeepMind’s existing lightweight open-source models, specifically trained to understand and predict the properties of drugs and gene therapies throughout the entire process of discovery. This includes identifying promising entries and predicting clinical trial outcomes.
Google trained TxGemma’s family of models from Gemma 2 using 7 million training examples. The models come in three sizes, including 2 billion, 9 billion and 27 billion parameters.
Each size includes a “predict” version, tailored for narrow tasks drawn from the Therapeutic Data Commons. Examples of these specific tasks include classifying drugs for capability such as crossing the blood-brain barrier, regression for predicting a drug’s binding capability or generating other types of drugs based on a particular reaction.
TxGemma 9B and 27B also include “chat” versions. These models explain their reasoning, answer questions and engage in conversation. As a result, researchers could ask TxGemma-Chat why it predicted a particular molecule might be toxic and delve into the molecule’s structure.
Just like every other model that Google builds, TxGemma is designed for integration into advanced agentic AI systems and includes tool use to tackle more complex research problems.
“Standard language models often struggle with tasks requiring up-to-date external knowledge or multi-step reasoning,” Shekoofeh Azizi, a staff research scientist at Google. “To address this, we’ve developed Agentic-Tx, a therapeutics-focused agentic system powered by Gemini 2.0 Pro.”
Agentic-Tx is equipped with 18 tools that include TxGemma for multi-step reasoning; general search tools from PubMed, Wikipedia and the web; specific molecular tools; and gene and protein tools. This AI agent tool can be used to orchestrate therapeutic research design work and answer multi-step research questions for scientists and clinicians.
TxGemma is available today on Vertex AI Model Garden and Hugging Face.
THANK YOU