The Verge - Artificial Intelligences 2024年09月13日
OpenAI releases o1, its first model with ‘reasoning’ abilities
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI发布了名为o1的新模型,这是其计划中的一系列“推理”模型中的第一个。o1经过训练,能够比人类更快地回答更复杂的问题。与o1一起发布的还有更小、更便宜的版本o1-mini。o1代表了OpenAI朝着实现类人人工智能的目标迈出的重要一步。从更实际的角度来看,它在编写代码和解决多步骤问题方面比之前的模型做得更好。但它也比GPT-4更昂贵,使用起来也更慢。OpenAI将o1的发布称为“预览”,以强调其仍处于起步阶段。

💻 **o1:OpenAI的新推理模型** OpenAI发布了名为o1的新模型,这是其计划中的一系列“推理”模型中的第一个。o1经过训练,能够比人类更快地回答更复杂的问题。o1代表了OpenAI朝着实现类人人工智能的目标迈出的重要一步。从更实际的角度来看,它在编写代码和解决多步骤问题方面比之前的模型做得更好。但它也比GPT-4更昂贵,使用起来也更慢。OpenAI将o1的发布称为“预览”,以强调其仍处于起步阶段。 与o1一起发布的还有更小、更便宜的版本o1-mini。ChatGPT Plus和Team用户从今天开始就可以访问o1-preview和o1-mini,而Enterprise和Edu用户将在下周初获得访问权限。OpenAI表示,他们计划将o1-mini的访问权限提供给所有免费使用ChatGPT的用户,但尚未确定发布日期。o1的开发者访问权限非常昂贵:在API中,o1-preview每百万个输入token(模型解析的文本块)收费15美元,每百万个输出token收费60美元。相比之下,GPT-4的费用为每百万个输入token 5美元,每百万个输出token 15美元。

📘 **o1的训练方法:强化学习和链式思考** o1的训练方法与之前的GPT模型完全不同。OpenAI的研究主管杰里·特沃雷克告诉我,虽然该公司对具体细节保持模糊,但他们表示o1“使用了完全新的优化算法和专门为其定制的新训练数据集”。 OpenAI之前训练GPT模型模仿其训练数据中的模式。而o1则是通过一种称为强化学习的技术来训练模型自主解决问题,这种技术通过奖励和惩罚来教导系统。然后它使用“链式思考”来处理查询,类似于人类通过逐步思考来解决问题。 由于采用了这种新的训练方法,OpenAI表示该模型应该更加准确。“我们注意到这个模型的幻觉现象减少了,”特沃雷克说。但问题仍然存在。“我们不能说我们已经解决了幻觉问题。”

🔎 **o1的优势:解决复杂问题,推理能力强** 根据OpenAI的说法,与GPT-4相比,o1的主要区别在于它能够比之前的模型更好地解决复杂问题,例如编码和数学问题,同时还能解释其推理过程。 “这个模型在解决AP数学测试方面肯定比我好,而我大学时可是数学辅修,”OpenAI的首席研究官鲍勃·麦格鲁告诉我。他说,OpenAI还将o1与国际数学奥林匹克竞赛的资格考试进行了对比,结果GPT-4只正确解决了13%的问题,而o1的得分达到了83%。 在被称为Codeforces竞赛的在线编程竞赛中,这个新模型达到了参赛者的第89百分位,OpenAI声称该模型的下一个更新将在“物理、化学和生物学领域具有挑战性的基准任务上表现与博士生相似”。

Image: The Verge

OpenAI is releasing a new model called o1, the first in a planned series of “reasoning” models that have been trained to answer more complex questions, faster than a human can. It’s being released alongside o1-mini, a smaller, cheaper version. And yes, if you’re steeped in AI rumors: this is, in fact, the extremely hyped Strawberry model.

For OpenAI, o1 represents a step toward its broader goal of human-like artificial intelligence. More practically, it does a better job at writing code and solving multistep problems than previous models. But it’s also more expensive and slower to use than GPT-4o. OpenAI is calling this release of o1 a “preview” to emphasize how nascent it is.

ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today, while Enterprise and Edu users will get access early next week. OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn’t set a release date yet. Developer access to o1 is really expensive: In the API, o1-preview is $15 per 1 million input tokens, or chunks of text parsed by the model, and $60 per 1 million output tokens. For comparison, GPT-4o costs $5 per 1 million input tokens and $15 per 1 million output tokens.

The training behind o1 is fundamentally different from its predecessors, OpenAI’s research lead, Jerry Tworek, tells me, though the company is being vague about the exact details. He says o1 “has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it.”

Image: OpenAI

OpenAI taught previous GPT models to mimic patterns from its training data. With o1, it trained the model to solve problems on its own using a technique known as reinforcement learning, which teaches the system through rewards and penalties. It then uses a “chain of thought” to process queries, similarly to how humans process problems by going through them step-by-step.

As a result of this new training methodology, OpenAI says the model should be more accurate. “We have noticed that this model hallucinates less,” Tworek says. But the problem still persists. “We can’t say we solved hallucinations.”

The main thing that sets this new model apart from GPT-4o is its ability to tackle complex problems, such as coding and math, much better than its predecessors while also explaining its reasoning, according to OpenAI.

“The model is definitely better at solving the AP math test than I am, and I was a math minor in college,” OpenAI’s chief research officer, Bob McGrew, tells me. He says OpenAI also tested o1 against a qualifying exam for the International Mathematics Olympiad, and while GPT-4o only correctly solved only 13 percent of problems, o1 scored 83 percent.

In online programming contests known as Codeforces competitions, this new model reached the 89th percentile of participants, and OpenAI claims the next update of this model will perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry and biology.”

At the same time, o1 is not as capable as GPT-4o in a lot of areas. It doesn’t do as well on factual knowledge about the world. It also doesn’t have the ability to browse the web or process files and images. Still, the company believes it represents a brand-new class of capabilities. It was named o1 to indicate “resetting the counter back to 1.”

“I’m gonna be honest: I think we’re terrible at naming, traditionally,” McGrew says. “So I hope this is the first step of newer, more sane names that better convey what we’re doing to the rest of the world.”

I wasn’t able to demo o1 myself, but McGrew and Tworek showed it to me over a video call this week. They asked it to solve this puzzle:

“A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of prince and princess? Provide all solutions to that question.”

The model buffered for 30 seconds and then delivered a correct answer. OpenAI has designed the interface to show the reasoning steps as the model thinks. What’s striking to me isn’t that it showed its work — GPT-4o can do that if prompted — but how deliberately o1 appeared to mimic human-like thought. Phrases like “I’m curious about,” “I’m thinking through,” and “Ok, let me see” created a step-by-step illusion of thinking.

But this model isn’t thinking, and it’s certainly not human. So, why design it to seem like it is?

Image: OpenAI
Phrases like “I’m curious about,” “I’m thinking through,” and “Ok, let me see” create a step-by-step illusion of thinking.

OpenAI doesn’t believe in equating AI model thinking with human thinking, according to Tworek. But the interface is meant to show how the model spends more time processing and diving deeper into solving problems, he says. “There are ways in which it feels more human than prior models.”

“I think you’ll see there are lots of ways where it feels kind of alien, but there are also ways where it feels surprisingly human,” says McGrew. The model is given a limited amount of time to process queries, so it might say something like, “Oh, I’m running out of time, let me get to an answer quickly.” Early on, during its chain of thought, it may also seem like it’s brainstorming and say something like, “I could do this or that, what should I do?”

Building toward agents

Large language models aren’t exactly that smart as they exist today. They’re essentially just predicting sequences of words to get you an answer based on patterns learned from vast amounts of data. Take ChatGPT, which tends to mistakenly claim that the word “strawberry” has only two Rs because it doesn’t break down the word correctly. For what it’s worth, the new o1 model did get that query correct.

As OpenAI reportedly looks to raise more funding at an eye-popping $150 billion valuation, its momentum depends on more research breakthroughs. The company is bringing reasoning capabilities to LLMs because it sees a future with autonomous systems, or agents, that are capable of making decisions and taking actions on your behalf.

For AI researchers, cracking reasoning is an important next step toward human-level intelligence. The thinking is that, if a model is capable of more than pattern recognition, it could unlock breakthroughs in areas like medicine and engineering. For now, though, o1’s reasoning abilities are relatively slow, not agent-like, and expensive for developers to use.

“We have been spending many months working on reasoning because we think this is actually the critical breakthrough,” McGrew says. “Fundamentally, this is a new modality for models in order to be able to solve the really hard problems that it takes in order to progress towards human-like levels of intelligence.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI o1 推理模型 人工智能 强化学习
相关文章