OpenAI launches o3-mini, its latest ‘reasoning’ model

OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company’s o family of reasoning models.

OpenAI first previewed the model in December alongside a more capable system called o3, but the launch comes at a pivotal moment for the company, whose ambitions — and challenges — are seemingly growing by the day.

OpenAI is battling the perception that it’s ceding ground in the AI race to Chinese companies like DeepSeek, which OpenAI alleges might have stolen its IP. Nonetheless, the ChatGPT maker has managed to win over scores of developers, and it’s been trying to shore up its relationship with Washington as it simultaneously pursues an ambitious data center project, It’s reportedly also laying the groundwork for one of the largest financing rounds by a tech company in history.

Which brings us to o3-mini. OpenAI is pitching its new model as both “powerful” and “affordable.”

“Today’s launch marks […] an important step toward broadening accessibility to advanced AI in service of our mission,” an OpenAI spokesperson told TechCrunch.

Unlike most large language models, reasoning models like o3-mini thoroughly fact-check themselves before giving out results. This helps them avoid some of the pitfalls that normally trip up models. These reasoning models do take a little longer to arrive at solutions, but the trade-off is that they tend to be more reliable — though not perfect — in domains like physics.

O3-mini is fine-tuned for STEM problems, specifically for programming, math, and science. OpenAI claims the model is largely on par with the o1 family, o1 and o1-mini in terms of capabilities, but runs faster and costs less.

The company claimed that external testers preferred o3-mini’s answers over those from o1-mini more than half the time. O3-mini apparently also made 39% fewer “major mistakes” on “tough real-world questions” in A/B tests versus o1-mini, and produced “clearer” responses while delivering answers about 24% faster.

O3-mini will be available to all users via ChatGPT starting Friday, but users who pay for the company’s ChatGPT Plus and Team plans will get a higher rate limit of 150 queries per day, while ChatGPT Pro subscribers will get unlimited access. OpenAI said o3-mini will come to ChatGPT Enterprise and ChatGPT Edu customers in a week (no word on ChatGPT Gov).

Users with premium ChatGPT plans can select o3-mini using the drop-down menu. Free users can click or tap the new “Reason” button in the chat bar, or have ChatGPT “re-generate” an answer.

Beginning Friday, o3-mini will also be available via OpenAI’s API to select developers, but it initially will not have support for analyzing images. Devs can select the level of “reasoning effort” (low, medium, or high) to get o3-mini to “think harder” based on their use case and latency needs.

O3-mini is priced at $1.10 per million cached input tokens and $4.40 per million output tokens, where a million tokens equates to roughly 750,000 words. That’s 63% cheaper than o1-mini, and competitive with DeepSeek’s R1 reasoning model pricing. DeepSeek charges $0.14 per million cached input tokens and $2.19 per million output tokens for R1 access through its API.

In ChatGPT, o3-mini is set to medium reasoning effort, which OpenAI says provides “a balanced trade-off between speed and accuracy.” Paid users will have the option of selecting “o3-mini-high” in the model picker, which will deliver what OpenAI calls “higher-intelligence” in exchange for slower responses.

Regardless of which version of o3-mini ChatGPT users choose, the model will work with search to find up-to-date answers with links to relevant web sources. OpenAI cautions that the functionality is a “prototype” as it works to integrate search across its reasoning models.

“While o1 remains our broader general-knowledge reasoning model, o3-mini provides a specialized alternative for technical domains requiring precision and speed,” OpenAI wrote in a blog post on Friday. “The release of o3-mini marks another step in OpenAI’s mission to push the boundaries of cost-effective intelligence.”

O3-mini is not OpenAI’s most powerful model to date, nor does it leapfrog DeepSeek’s R1 reasoning model in every benchmark.

O3-mini beats R1 on AIME 2024, a test that measures how well models understand and respond to complex instructions — but only with high reasoning effort. It also beats R1 on the programming-focused test SWE-bench Verified (by .1 point), but again, only with high reasoning effort. On low reasoning effort, o3-mini lags R1 on GPQA Diamond, which tests models with PhD-level physics, biology and chemistry questions.

To be fair, o3-mini answers many queries at competitively low cost and latency. In the post, OpenAI compares its performance to the o1 family:

“With low reasoning effort, o3-mini achieves comparable performance with o1-mini, while with medium effort, o3-mini achieves comparable performance with o1,” OpenAI writes. “O3-mini with medium reasoning effort matches o1’s performance in math, coding and science while delivering faster responses. Meanwhile, with high reasoning effort, o3-mini outperforms both o1-mini and o1.”

It’s worth noting that o3-mini’s performance advantage over o1 is slim in some areas. On AIME 2024, o3-mini beats o1 by just 0.3 percentage points when set to high reasoning effort. And on GPQA Diamond, o3-mini doesn’t surpass o1’s score even on high reasoning effort.

OpenAI asserts that o3-mini is as “safe” or safer than the o1 family, however, thanks to red-teaming efforts and its “deliberative alignment” methodology, which makes models “think” about OpenAI’s safety policy while they’re responding to queries. According to the company, o3-mini “significantly surpasses” one of OpenAI’s flagship models, GPT-4o, on “challenging safety and jailbreak evaluations.”

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签