TechCrunch News 02月01日
OpenAI launches o3-mini, its latest ‘reasoning’ model
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI发布了新的AI推理模型o3-mini,旨在提高AI的可访问性和效率。该模型专为STEM领域的问题而优化,在编程、数学和科学方面表现出色。与之前的o1系列相比,o3-mini在速度和成本方面更具优势,同时在某些测试中表现出更高的准确性。用户可以通过ChatGPT访问o3-mini,付费用户享有更高的查询速率和高级功能。此外,开发者可以通过API使用o3-mini,并根据需求调整推理强度。OpenAI强调,o3-mini不仅在性能上有所提升,而且在安全性方面也得到了加强,使其成为一个更可靠的AI工具。

🚀o3-mini是OpenAI推出的新型AI推理模型,专注于提升AI在STEM领域的应用,特别是在编程、数学和科学方面。它通过更彻底的事实核查来减少错误,提高了在复杂问题上的可靠性。

💰o3-mini在性能上与o1系列相当,但运行速度更快,成本更低。其定价策略更具竞争力,尤其在API使用方面,相较于o1-mini降低了63%。此外,它在部分测试中超越了DeepSeek的R1推理模型。

⏱️o3-mini通过ChatGPT提供服务,免费用户可以通过“Reason”按钮或重新生成答案来使用,付费用户则享有更高的查询速率和高级功能,如选择不同推理强度。OpenAI还为开发者提供API访问,并允许调整推理强度以适应不同的应用场景。

🛡️OpenAI强调o3-mini的安全性,通过红队测试和“深思熟虑的对齐”方法,确保模型在响应查询时考虑到OpenAI的安全政策。在安全性和越狱评估中,o3-mini甚至超越了GPT-4o。

OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company’s o family of reasoning models.

OpenAI first previewed the model in December alongside a more capable system called o3, but the launch comes at a pivotal moment for the company, whose ambitions — and challenges — are seemingly growing by the day.

OpenAI is battling the perception that it’s ceding ground in the AI race to Chinese companies like DeepSeek, which OpenAI alleges might have stolen its IP. Nonetheless, the ChatGPT maker has managed to win over scores of developers, and it’s been trying to shore up its relationship with Washington as it simultaneously pursues an ambitious data center project, It’s reportedly also laying the groundwork for one of the largest financing rounds by a tech company in history.

Which brings us to o3-mini. OpenAI is pitching its new model as both “powerful” and “affordable.”

“Today’s launch marks […] an important step toward broadening accessibility to advanced AI in service of our mission,” an OpenAI spokesperson told TechCrunch.

Unlike most large language models, reasoning models like o3-mini thoroughly fact-check themselves before giving out results. This helps them avoid some of the pitfalls that normally trip up models. These reasoning models do take a little longer to arrive at solutions, but the trade-off is that they tend to be more reliable — though not perfect — in domains like physics.

O3-mini is fine-tuned for STEM problems, specifically for programming, math, and science. OpenAI claims the model is largely on par with the o1 family, o1 and o1-mini in terms of capabilities, but runs faster and costs less.

The company claimed that external testers preferred o3-mini’s answers over those from o1-mini more than half the time. O3-mini apparently also made 39% fewer “major mistakes” on “tough real-world questions” in A/B tests versus o1-mini, and produced “clearer” responses while delivering answers about 24% faster.

O3-mini will be available to all users via ChatGPT starting Friday, but users who pay for the company’s ChatGPT Plus and Team plans will get a higher rate limit of 150 queries per day, while ChatGPT Pro subscribers will get unlimited access. OpenAI said o3-mini will come to ChatGPT Enterprise and ChatGPT Edu customers in a week (no word on ChatGPT Gov).

Users with premium ChatGPT plans can select o3-mini using the drop-down menu. Free users can click or tap the new “Reason” button in the chat bar, or have ChatGPT “re-generate” an answer.

Beginning Friday, o3-mini will also be available via OpenAI’s API to select developers, but it initially will not have support for analyzing images. Devs can select the level of “reasoning effort” (low, medium, or high) to get o3-mini to “think harder” based on their use case and latency needs.

O3-mini is priced at $1.10 per million cached input tokens and $4.40 per million output tokens, where a million tokens equates to roughly 750,000 words. That’s 63% cheaper than o1-mini, and competitive with DeepSeek’s R1 reasoning model pricing. DeepSeek charges $0.14 per million cached input tokens and $2.19 per million output tokens for R1 access through its API.

In ChatGPT, o3-mini is set to medium reasoning effort, which OpenAI says provides “a balanced trade-off between speed and accuracy.” Paid users will have the option of selecting “o3-mini-high” in the model picker, which will deliver what OpenAI calls “higher-intelligence” in exchange for slower responses.

Regardless of which version of o3-mini ChatGPT users choose, the model will work with search to find up-to-date answers with links to relevant web sources. OpenAI cautions that the functionality is a “prototype” as it works to integrate search across its reasoning models.

“While o1 remains our broader general-knowledge reasoning model, o3-mini provides a specialized alternative for technical domains requiring precision and speed,” OpenAI wrote in a blog post on Friday. “The release of o3-mini marks another step in OpenAI’s mission to push the boundaries of cost-effective intelligence.”

O3-mini is not OpenAI’s most powerful model to date, nor does it leapfrog DeepSeek’s R1 reasoning model in every benchmark.

O3-mini beats R1 on AIME 2024, a test that measures how well models understand and respond to complex instructions — but only with high reasoning effort. It also beats R1 on the programming-focused test SWE-bench Verified (by .1 point), but again, only with high reasoning effort. On low reasoning effort, o3-mini lags R1 on GPQA Diamond, which tests models with PhD-level physics, biology and chemistry questions.

To be fair, o3-mini answers many queries at competitively low cost and latency. In the post, OpenAI compares its performance to the o1 family:

“With low reasoning effort, o3-mini achieves comparable performance with o1-mini, while with medium effort, o3-mini achieves comparable performance with o1,” OpenAI writes. “O3-mini with medium reasoning effort matches o1’s performance in math, coding and science while delivering faster responses. Meanwhile, with high reasoning effort, o3-mini outperforms both o1-mini and o1.”

It’s worth noting that o3-mini’s performance advantage over o1 is slim in some areas. On AIME 2024, o3-mini beats o1 by just 0.3 percentage points when set to high reasoning effort. And on GPQA Diamond, o3-mini doesn’t surpass o1’s score even on high reasoning effort.

OpenAI asserts that o3-mini is as “safe” or safer than the o1 family, however, thanks to red-teaming efforts and its “deliberative alignment” methodology, which makes models “think” about OpenAI’s safety policy while they’re responding to queries. According to the company, o3-mini “significantly surpasses” one of OpenAI’s flagship models, GPT-4o, on “challenging safety and jailbreak evaluations.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI o3-mini AI推理模型 STEM ChatGPT
相关文章