TechCrunch News 03月26日
Google unveils a next-gen AI reasoning model
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌推出Gemini 2.5,这是一系列AI推理模型,其中Gemini 2.5 Pro Experimental是最智能的模型。该模型在多方面表现出色,具有推理能力,将应用于多种场景,但价格较高且未公布API定价。

💻Gemini 2.5 Pro是谷歌最智能的AI模型,具有推理能力

🎯在多种测试中Gemini 2.5 Pro成绩优异,但在部分测试中表现不一

📖该模型可摄入大量文字,且更大的上下文窗口即将推出

💰模型价格较高,且未公布API定价

On Tuesday, Google unveiled Gemini 2.5, a new family of AI reasoning models that pauses to “think” before answering a question.

To kick off the new family of models, Google is launching Gemini 2.5 Pro Experimental, a multimodal, reasoning AI model that the company claims is its most intelligent model yet. This model will be available on Tuesday in the company’s developer platform, Google AI Studio, as well as in the Gemini app for subscribers to the company’s $20-a-month AI plan, Gemini Advanced.

Moving forward, Google says all of its new AI models will have reasoning capabilities baked in.

Since OpenAI launched the first AI reasoning model in September 2024, o1, the tech industry has raced to match or exceed that model’s capabilities with their own. Today, Anthropic, DeepSeek, Google, and xAI all have AI reasoning models, which use extra computing power and time to fact-check and reason through problems before delivering an answer.

Reasoning techniques have helped AI models achieve new heights in math and coding tasks. Many in the tech world believe reasoning models will be a key component of AI agents, autonomous systems that can perform tasks largely san human intervention. However, these models are also more expensive.

Google claims that Gemini 2.5 Pro outperforms its previous frontier AI models, and some of the competing leading AI models, on several benchmarks. Specifically, Google says it designed Gemini 2.5 to excel at creating visually compelling web apps and agentic coding applications.

On an evaluation measuring code editing, called Aider Polyglot, Google says Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek.

However, on another test measuring software dev abilities, SWE-bench Verified, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI’s o3-mini and DeepSeek’s R1, but underperforming Anthropic’s Claude 3.7 Sonnet, which scored 70.3%.

On Humanity’s Last Exam — a multimodal test including thousands of crowdsourced questions around math, humanities, and the natural sciences — Google says Gemini 2.5 Pro scores 18.8%, outperforming leading AI models from OpenAI, Anthropic, and DeepSeek.

To start, Google says Gemini 2.5 Pro is shipping with a 1 million token context window, which means the AI model can intake roughly 750,000 words in a single prompt. That’s longer than the entire Lord of The Rings book series. However, Google says a 2 million token context window is coming soon.

Google has experimented before with AI reasoning models — previously releasing a “thinking” version of Gemini in December — but Gemini 2.5 represents the company’s most serious competitor to OpenAI o series of models yet.

Google didn’t share API pricing for Gemini 2.5 Pro.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini 2.5 AI推理 谷歌 性能测试
相关文章