A Google Gemini model now has a “dial” to adjust how much it reasons

Google DeepMind’s latest update to a top Gemini AI model includes a dial to control how much the system “thinks” through a response. The new feature is ostensibly designed to save money for developers, but it also concedes a problem: Reasoning models, the tech world’s new obsession, are prone to overthinking, burning money and energy in the process.

Since 2019, there have been a couple of tried and true ways to make an AI model more powerful. One was to make it bigger by using more training data, and the other was to give it better feedback on what constitutes a good answer. But toward the end of last year, Google DeepMind and other AI companies turned to a third method: reasoning.

“We’ve been really pushing on ‘thinking,’” says Jack Rae, a principal research scientist at DeepMind. Such models, which are built to work through problems logically and spend more time arriving at an answer, rose to prominence earlier this year with the launch of the DeepSeek R1 model. They’re attractive to AI companies because they can make an existing model better by training it to approach a problem pragmatically. That way, the companies can avoid having to build a new model from scratch.

When the AI model dedicates more time (and energy) to a query, it costs more to run. Leaderboards of reasoning models show that one task can cost upwards of $200 to complete. The promise is that this extra time and money help reasoning models do better at handling challenging tasks, like analyzing code or gathering information from lots of documents.

“The more you can iterate over certain hypotheses and thoughts,” says Google DeepMind chief technical officer Koray Kavukcuoglu, the more “it’s going to find the right thing.”

This isn’t true in all cases, though. “The model overthinks,” says Tulsee Doshi, who leads the product team at Gemini, referring specifically to Gemini Flash 2.5, the model released today that includes a slider for developers to dial back how much it thinks. “For simple prompts, the model does think more than it needs to.”

When a model spends longer than necessary on a problem only to arrive at a mediocre answer, it makes the model expensive to run for developers and worsens AI’s environmental footprint.

Nathan Habib, an engineer at Hugging Face who has studied the proliferation of such reasoning models, says overthinking is abundant. In the rush to show off smarter AI, companies are reaching for reasoning models like hammers even where there’s no nail in sight, Habib says. Indeed, when OpenAI announced a new model in February, it said it would be the company’s last nonreasoning model.

The performance gain is “undeniable” for certain tasks, Habib says, but not for many others where people normally use AI. Even when reasoning is used for the right problem, things can go awry. Habib showed me an example of a leading reasoning model that was asked to work through an organic chemistry problem. It started out okay, but halfway through its reasoning process the model’s responses started resembling a meltdown: It sputtered “Wait, but …” hundreds of times. It ended up taking far longer than a nonreasoning model would spend on one task. Kate Olszewska, who works on evaluating Gemini models at DeepMind, says Google’s models can also get stuck in loops.

Google’s new “reasoning” dial is one attempt to solve that problem. For now, it’s built not for the consumer version of Gemini but for developers who are making apps. Developers can set a budget for how much computing power the model should spend on a certain problem, the idea being to turn down the dial if the task shouldn’t involve much reasoning at all. Outputs from the model are about six times more expensive to generate when reasoning is turned on.

Another reason for this flexibility is that it’s not yet clear when more reasoning will be required to get a better answer.

“It’s really hard to draw a boundary on, like, what’s the perfect task right now for thinking?” Rae says.

Obvious tasks include coding (developers might paste hundreds of lines of code into the model and then ask for help), or generating expert-level research reports. The dial would be turned way up for these, and developers might find the expense worth it. But more testing and feedback from developers will be needed to find out when medium or low settings are good enough.

Habib says the amount of investment in reasoning models is a sign that the old paradigm for how to make models better is changing. “Scaling laws are being replaced,” he says.

Instead, companies are betting that the best responses will come from longer thinking times rather than bigger models. It’s been clear for several years that AI companies are spending more money on inferencing—when models are actually “pinged” to generate an answer for something—than on training, and this spending will accelerate as reasoning models take off. Inferencing is also responsible for a growing share of emissions.

(While on the subject of models that “reason” or “think”: an AI model cannot perform these acts in the way we normally use such words when talking about humans. I asked Rae why the company uses anthropomorphic language like this. “It’s allowed us to have a simple name,” he says, “and people have an intuitive sense of what it should mean.” Kavukcuoglu says that Google is not trying to mimic any particular human cognitive process in its models.)

Even if reasoning models continue to dominate, Google DeepMind isn’t the only game in town. When the results from DeepSeek began circulating in December and January, it triggered a nearly $1 trillion dip in the stock market because it promised that powerful reasoning models could be had for cheap. The model is referred to as “open weight”—in other words, its internal settings, called weights, are made publicly available, allowing developers to run it on their own rather than paying to access proprietary models from Google or OpenAI. (The term “open source” is reserved for models that disclose the data they were trained on.)

So why use proprietary models from Google when open ones like DeepSeek are performing so well? Kavukcuoglu says that coding, math, and finance are cases where “there’s high expectation from the model to be very accurate, to be very precise, and to be able to understand really complex situations,” and he expects models that deliver on that, open or not, to win out. In DeepMind’s view, this reasoning will be the foundation of future AI models that act on your behalf and solve problems for you.

“Reasoning is the key capability that builds up intelligence,” he says. “The moment the model starts thinking, the agency of the model has started.”

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签