少点错误 02月09日
Gary Marcus now saying AI can't do things it can already do
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章讨论了Gary Marcus对AI模型的批评,以及这些批评是否仍然有效。Marcus曾多次指出GPT模型的缺陷,但随后的模型迭代往往能解决这些问题。文章指出,Marcus最近的批评是基于较旧的GPT-4o版本,而更先进的推理模型已经克服了他所发现的许多问题。文章通过实际测试表明,GPT-o1在Marcus提出的问题上表现出色,尽管数据中可能仍然存在一些错误。文章还提到,Marcus曾预测AI的扩展将会遇到瓶颈,但目前来看,AI的扩展仍在继续。

🗓️Gary Marcus曾多次批评GPT模型,认为其存在缺陷,但AI技术发展迅速,后续模型迭代往往能解决先前的问题。

🤖Marcus最近的批评是基于GPT-4o版本,而更先进的推理模型(如GPT-o1)在处理相同问题时表现更佳,能够避免GPT-4o犯的错误。

📊通过实际案例,GPT-o1能够正确列出美国各州的人口、面积、家庭收入等信息,并计算各州名称中的元音数量,证明其在数据处理和推理能力上的提升。

⚠️文章也指出,虽然AI取得了显著进步,但数据中可能仍然存在错误,且AI Agent尚未完全成熟,未来的发展仍面临挑战。

Published on February 9, 2025 12:24 PM GMT

January 2020, Gary Marcus wrote GPT-2 And The Nature Of Intelligence, demonstrating a bunch of easy problems that GPT-2 couldn’t get right.

He concluded these were “a clear sign that it is time to consider investing in different approaches.”

Two years later, GPT-3 could get most of these right.

Marcus wrote a new list of 15 problems GPT-3 couldn’t solve, concluding “more data makes for a better, more fluent approximation to language; it does not make for trustworthy intelligence.”

A year later, GPT-4 could get most of these right.

Now he’s gone one step further, and criticised limitations that have already been overcome.

Last week Marcus put a series of questions into chatGPT, found mistakes, and concluded AGI is an example of “the madness of crowds”.

However, Marcus used the free version, which only includes GPT-4o. That was released in May 2024, an eternity behind the frontier in AI.

More importantly, it’s not a reasoning model, which is where most of the recent progress has been.

For the huge cost of $20 a month, I have access to GPT-o1 (not the most advanced model OpenAI offers, let alone the best that exists).

I asked GPT-o1 the same questions Marcus did and it didn’t make any of the mistakes he spotted.

First he asked it:

Make a table of every state in the US, including population, area and median household income, sorted in order of median household income.

GPT-4o misses out a bunch of states. GPT-o1 lists all 50 (full transcript).

Then he asked for a column added on population density. This also seemed to work fine.

He then made a list of Canadian provinces and asked for a column listing how many vowels were in each name.

I was running out of patience, so asked the same question about the US states. This also worked:

To be clear, there are probably still some mistakes in the data (just as I’d expect from most human assistants). The point is that the errors Marcus identified aren’t showing up.

He goes on to correctly point out that agents aren’t yet working well. (If they were, things would already be nuts.)

And list some other questions o1 can already handle.

Reasoning models are much better at these kinds of tasks, because they can double check their work.

However, they’re still fundamentally based on LLMs – just with a bunch of extra reinforcement learning.

Marcus’ Twitter bio is “Warned everyone in 2022 that scaling would run out.” I agree scaling will run out at some point, but it clearly hasn’t yet.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 GPT模型 AI发展 推理模型 Gary Marcus
相关文章