AI
AI
AI
Google LLC has just announced a new version of its Gemini large language model that can navigate the web through a browser and interact with various websites, meaning it can perform tasks such as searching for information or buying things without human supervision.
The model, Gemini 2.5 Computer Use, uses a combination of visual understanding and reasoning to analyze user’s requests and carry out tasks in the browser. It will complete all of the actions required to fulfill that task, such as clicking, typing, scrolling, manipulating dropdown menus and filling out and submitting forms, just as a human can do.
In a blog post, Google’s DeepMind research outfit said Gemini 2.5 Computer Use is based on the Gemini 2.5 Pro LLM. It explained that earlier versions of the model have been used to power earlier agentic features it has launched in tools such as AI Mode and Project Mariner. But this is the first time the complete model has been made available.
The company explained that each request kicks off a “loop” that involves the model go through various steps until it’s considered complete. First, the user sends a request to the model, which can also include screenshots of the website in question and a history of recent actions. Then, Gemini 2.5 Computer Use will analyze those inputs and generate a response, which will typically be a “function call representing one of the UI actions such as clicking or typing.”
Client-side code will then execute the required action, and after this is done, a new screenshot of the graphical user interface and the current website will be sent back to the model as a function response.
Google posted a few demonstration videos showing the computer use tool in action, noting that they are sped up by three times. The first video is based on the following prompt:
“From https://tinyurl.com/pet-care-signup, get all details for any pet with a California residency and add them as a guest in my spa CRM at https://pet-luxe-spa.web.app/. Then, set up a follow up visit appointment with the specialist Anima Lavar for October 10th anytime after 8am. The reason for the visit is the same as their requested treatment.”
Google is somewhat late to the party here. Just yesterday, OpenAI revealed a number of new applications for ChatGPT, enhancing the capabilities of its ChatGPT Agent feature that’s designed to complete various tasks on user’s behalf using a computer. Anthropic PBC first released a version of its flagship Claude AI model that has the ability to use a computer last year.
Not only is Google’s computer use model late, but it’s not as comprehensive. Unlike OpenAI’s and Anthropic’s tools, it’s only able to access a web browser, rather than the entire computer operating system. “It’s not yet optimized for desktop OS-level control, and currently supports 13 actions,” the company explained.
Still, DeepMind’s researchers say their focus on getting Gemini 2.5 Computer Use to work specifically in web browsers has paid off in terms of its performance. They claim that it “outperforms leading alternatives on multiple web and mobile benchmarks,” including Online-Mind2Web and WebVoyager. They noted that it’s primarily optimized for web browsers and so it performs better in them, but even so, it still outperformed its peers on the AndroidWorld benchmark, which demonstrates “strong promise for mobile UI control tasks,” the researchers said.

In addition, they claimed that Gemini 2.5 Computer Use is superior in terms of browser control at the lowest latency, based on its performance on the Browserbase harness for Online-Mind2Web.
Here’s a second example of the model in action, using a different prompt:
“My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.”
DeepMind’s researchers are making Gemini 2.5 Computer Use available to developers through the Google AI Studio and Vertex AI, and pricing aligns pretty closely with the standard Gemini 2.5 Pro model. They follow the same token-based billing structure, with input tokens priced at $1.25 per one million tokens for prompts with under 200,000 tokens, rising to $2.50 per million tokens for longer prompts. Output tokens are priced similarly for both models, at $10 per million for shorter responses and $15 per million for longer ones.
The only real difference is that, though Gemini 2.5 Pro is available for free with an explicit token cap, Gemini 2.5 Computer Use does not offer any free tier, so users must pay from the outset to access it.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.