Anthropic’s new ‘hybrid reasoning’ AI model is its smartest yet

The Verge - Artificial Intelligences 02月25日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Anthropic发布了其首个“混合推理模型”Claude 3.7 Sonnet，该模型在解决复杂问题（如数学和编码）方面表现出色，超越了之前的模型。同时，Anthropic还推出了名为Claude Code的“代理”编码工具的“有限研究预览版”，该工具可以主动搜索和阅读代码，编辑文件，编写和运行测试，提交代码到GitHub，并使用命令行工具。Claude 3.7 Sonnet已在Claude应用程序中以及通过Anthropic的API、Amazon Bedrock和Google Cloud的Vertix AI向开发者提供。

🚀 Anthropic发布了Claude 3.7 Sonnet，这是一个“混合推理模型”，能够解决更复杂的问题，并在数学和编码等领域超越了之前的模型。Anthropic的产品研究负责人Dianne Penn表示，公司希望简化模型的使用体验，推理是AI的一个特性，而不是完全独立的东西。

🤖 Anthropic还发布了Claude Code的“有限研究预览版”，这是一个“代理”编码工具，可以主动搜索和阅读代码，编辑文件，编写和运行测试，提交代码到GitHub，并使用命令行工具。Anthropic的员工已经使用新模型构建前端网站设计、互动游戏，甚至花费长达45分钟的时间进行编码工作。

🎮 Anthropic通过让AI模型玩旧式宝可梦视频游戏来测试其能力。Claude 3.5 Sonnet无法走出游戏开始时的Pallet Town，而3.7版本能够击败多个健身馆馆长。这表明Claude 3.7 Sonnet在执行复杂任务方面的显著改进。

💰 Claude 3.7 Sonnet的运行成本与其前身3.5 Sonnet相同，为每百万输入token 3美元，每百万输出token 15美元。Anthropic还允许开发者通过其scratchpad来帮助指导模型的“思考”方式，甚至可以精确地规定响应所需的时间。

Anthropic is releasing Claude 3.7 Sonnet, its first “hybrid reasoning model” that can solve more complex problems and outperforms previous models in areas like math and coding.

In addition to a new model, Anthropic is also releasing a “limited research preview” of its “agentic” coding tool called Claude Code. While Anthropic already powers AI coding tools like Cursor, it’s pitching Claude Code as “an active collaborator that can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools.”

Claude 3.7 Sonnet is available starting Monday in the Claude app and for developers through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertix AI. The model costs the same to run as its predecessor, 3.5 Sonnet, at $3 per million input tokens and $15 per million output tokens.

While OpenAI and others offer separate so-called reasoning models, Anthropic product research lead Dianne Penn tells The Verge that the company wanted to simplify the experience of using a model. “We fundamentally believe that reasoning is a feature of the AI rather than a completely separate thing,” she says, noting that Claude shouldn’t take long to answer the question “What time is it?” versus responding to a more complex prompt like, “plan a two-week trip to Italy while considering the weather in late March.”

Penn says that Claude 3.7 Sonnet performs noticeably better on “agentic coding,” finance, and legal tasks. While Claude still lacks real-time web search like other models, version 3.7’s knowledge cut-off date of October 2024 is more up to date. Anthropic is also allowing developers to help steer how the model “thinks” via its scratchpad and even dictate exactly how long it takes to respond. “Sometimes the developer just needs to say it shouldn’t take more than 200 milliseconds to answer this question,” says Anthropic’s VP of product, Michael Gerstenhaber. “And that’s a product decision.”

Inside Anthropic, employees have used the new model to build front-end website designs, interactive games, and even spend up to 45 minutes on coding work by “building test sets and editing test cases back and forth iteratively,” according to Penn.

She says that the company also tests its models on their ability to advance through an old-school Pokémon video game by mapping the model’s API to a controller scheme. Claude 3.5 Sonnet couldn’t get out of Pallet Town at the beginning of the game while version 3.7 was able to defeat multiple gym leaders.

As Elon Musk showed with Grok-3 last week, the AI model race is moving incredibly fast. For now, Anthropic appears to be in the lead again thanks to Claude 3.7 Sonnet’s performance gains. Its release also suggests that, rather than offer standalone reasoning models, the industry is moving toward a future where one model can do everything.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签