MIT Technology Review » Artificial Intelligence 04月18日 03:23
A Google Gemini model now has a “dial” to adjust how much it reasons
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind 最近更新的 Gemini AI 模型引入了控制系统“思考”程度的功能,旨在降低开发者的成本。然而,这背后也揭示了推理模型的一个问题:它们容易过度思考,导致资源和能源的浪费。文章探讨了AI模型通过推理来提升性能的趋势,以及这种方法带来的挑战,包括高昂的运行成本和对环境的影响。同时,文章还提到了开放权重模型带来的竞争,以及在不同应用场景下对推理模型的选择和优化。

🧠 Google DeepMind 在其 Gemini AI 模型中引入了控制系统“思考”程度的功能,以降低开发者的成本。这一新功能承认了推理模型可能过度思考的问题,过度思考会消耗大量的计算资源。

💰 推理模型通过更长时间的思考来解决问题,这增加了运行成本。例如,完成一项任务的成本可能超过 200 美元。虽然推理模型在处理复杂任务(如分析代码或从大量文档中收集信息)时表现更好,但也会导致资源浪费。

⚙️ 模型的过度思考不仅增加了成本,还可能降低效率。当模型花费过长时间来解决一个问题,却只能给出平庸的答案时,会增加开发者的运行成本,并增加 AI 的环境足迹。

💡 开放权重模型,如 DeepSeek R1,正在挑战 Google 和 OpenAI 等专有模型的地位。DeepSeek R1 的内部设置(称为权重)是公开的,这使得开发者可以在自己的设备上运行它,从而降低了成本。

Google DeepMind’s latest update to a top Gemini AI model includes a dial to control how much the system “thinks” through a response. The new feature is ostensibly designed to save money for developers, but it also concedes a problem: Reasoning models, the tech world’s new obsession, are prone to overthinking, burning money and energy in the process.

Since 2019, there have been a couple of tried and true ways to make an AI model more powerful. One was to make it bigger by using more training data, and the other was to give it better feedback on what constitutes a good answer. But toward the end of last year, Google DeepMind and other AI companies turned to a third method: reasoning.

“We’ve been really pushing on ‘thinking,’” says Jack Rae, a principal research scientist at DeepMind. Such models, which are built to work through problems logically and spend more time arriving at an answer, rose to prominence earlier this year with the launch of the DeepSeek R1 model. They’re attractive to AI companies because they can make an existing model better by training it to approach a problem pragmatically. That way, the companies can avoid having to build a new model from scratch. 

When the AI model dedicates more time (and energy) to a query, it costs more to run. Leaderboards of reasoning models show that one task can cost upwards of $200 to complete. The promise is that this extra time and money help reasoning models do better at handling challenging tasks, like analyzing code or gathering information from lots of documents. 

“The more you can iterate over certain hypotheses and thoughts,” says Google DeepMind chief technical officer Koray Kavukcuoglu, the more “it’s going to find the right thing.”

This isn’t true in all cases, though. “The model overthinks,” says Tulsee Doshi, who leads the product team at Gemini, referring specifically to Gemini Flash 2.5, the model released today that includes a slider for developers to dial back how much it thinks. “For simple prompts, the model does think more than it needs to.” 

When a model spends longer than necessary on a problem only to arrive at a mediocre answer, it makes the model expensive to run for developers and worsens AI’s environmental footprint.

Nathan Habib, an engineer at Hugging Face who has studied the proliferation of such reasoning models, says overthinking is abundant. In the rush to show off smarter AI, companies are reaching for reasoning models like hammers even where there’s no nail in sight, Habib says. Indeed, when OpenAI announced a new model in February, it said it would be the company’s last nonreasoning model. 

The performance gain is “undeniable” for certain tasks, Habib says, but not for many others where people normally use AI. Even when reasoning is used for the right problem, things can go awry. Habib showed me an example of a leading reasoning model that was asked to work through an organic chemistry problem. It started out okay, but halfway through its reasoning process the model’s responses started resembling a meltdown: It sputtered “Wait, but …” hundreds of times. It ended up taking far longer than a nonreasoning model would spend on one task. Kate Olszewska, who works on evaluating Gemini models at DeepMind, says Google’s models can also get stuck in loops.

Google’s new “reasoning” dial is one attempt to solve that problem. For now, it’s built not for the consumer version of Gemini but for developers who are making apps. Developers can set a budget for how much computing power the model should spend on a certain problem, the idea being to turn down the dial if the task shouldn’t involve much reasoning at all. Outputs from the model are about six times more expensive to generate when reasoning is turned on.

Another reason for this flexibility is that it’s not yet clear when more reasoning will be required to get a better answer.

“It’s really hard to draw a boundary on, like, what’s the perfect task right now for thinking?” Rae says. 

Obvious tasks include coding (developers might paste hundreds of lines of code into the model and then ask for help), or generating expert-level research reports. The dial would be turned way up for these, and developers might find the expense worth it. But more testing and feedback from developers will be needed to find out when medium or low settings are good enough.

Habib says the amount of investment in reasoning models is a sign that the old paradigm for how to make models better is changing. “Scaling laws are being replaced,” he says. 

Instead, companies are betting that the best responses will come from longer thinking times rather than bigger models. It’s been clear for several years that AI companies are spending more money on inferencing—when models are actually “pinged” to generate an answer for something—than on training, and this spending will accelerate as reasoning models take off. Inferencing is also responsible for a growing share of emissions.

(While on the subject of models that “reason” or “think”: an AI model cannot perform these acts in the way we normally use such words when talking about humans. I asked Rae why the company uses anthropomorphic language like this. “It’s allowed us to have a simple name,” he says, “and people have an intuitive sense of what it should mean.” Kavukcuoglu says that Google is not trying to mimic any particular human cognitive process in its models.)

Even if reasoning models continue to dominate, Google DeepMind isn’t the only game in town. When the results from DeepSeek began circulating in December and January, it triggered a nearly $1 trillion dip in the stock market because it promised that powerful reasoning models could be had for cheap. The model is referred to as “open weight”—in other words, its internal settings, called weights, are made publicly available, allowing developers to run it on their own rather than paying to access proprietary models from Google or OpenAI. (The term “open source” is reserved for models that disclose the data they were trained on.) 

So why use proprietary models from Google when open ones like DeepSeek are performing so well? Kavukcuoglu says that coding, math, and finance are cases where “there’s high expectation from the model to be very accurate, to be very precise, and to be able to understand really complex situations,” and he expects models that deliver on that, open or not, to win out. In DeepMind’s view, this reasoning will be the foundation of future AI models that act on your behalf and solve problems for you.

“Reasoning is the key capability that builds up intelligence,” he says. “The moment the model starts thinking, the agency of the model has started.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google DeepMind AI推理 模型成本
相关文章