Google is previewing a new GEMINI AI model designed to navigate and interact with the web through a browser, allowing AI agents to do things inside interfaces designed for use by people, not robots. The model, called GEMINI 2.5, uses “visual understanding and reasoning capabilities” to analyze a user’s request and complete a task, such as filling out and submitting a form.
It can be used for UI testing or navigating interfaces designed for people who don’t have an API or other direct connection available. Other versions of this model have been used for agentic functionality in AI Mode and Project Mariner, a search prototype that uses AI agents to perform tasks on its own in a browser, such as adding items to your cart based on a list of ingredients.
Google’s announcement comes a day after OpenAI revealed new applications for Chatgpt as part of its annual Dev Day, and continues to focus its attention on its Chatgpt agent feature that can perform complex tasks on your behalf. Meanwhile, Anthropic had already released a version of its Claude AI model with “Computer Use” last year.
Google has released demo videos showing its desktop tool in action, and notes that they are 3x accelerated.
Google says its desktop usage model “outperforms leading alternatives across multiple web and mobile benchmarks.” Unlike ChatGpt’s agent and Anthropic’s computer usage tool, Google’s new AI model only has access to a browser – not an entire computing environment. Google notes that it says “it’s not yet optimized for desktop OS level control” and currently supports 13 actions, including opening a web browser, entering text, and dragging and dropping items.
Using the Gemini 2.5 computer is available to developers through Google AI Studio and Vertex AI, but there’s also a demo on Browserbase, where you watch tasks, like “play a 2048 game” or “browse hacker news for trending debates.”