Communications of the ACM - Artificial Intelligence 02月14日
‘Not on the Best Path’
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

认知科学家Gary Marcus对生成式人工智能(AI)持怀疑态度,他认为当前AI在技术和道德上都存在问题。Marcus指出,神经网络本质上是函数逼近器,而非真正理解函数。他强调了AI在处理训练数据之外的情况时表现出的局限性,以及在抽象意义理解上的不足。他主张结合神经网络与传统AI,构建更完善的AI模型,并认为当前行业对LLM的过度投资阻碍了其他AI技术的发展。Marcus还提出了“理解挑战”,即AI能够观看电影并理解其深层含义,作为衡量AI进展的基准。

🤔神经网络作为函数逼近器:Gary Marcus认为,大型语言模型(LLM)本质上是函数逼近器,它们擅长模仿人类语言的使用方式,但并不真正理解语言的内在含义。

📉外推能力的不足:Marcus早在1998年就指出了神经网络在处理训练数据之外的情况时表现出的局限性。即使是多层神经网络,在面对与训练集差异较大的数据时,仍然会出现问题,无法进行有效的泛化。

🤖抽象意义理解的缺失:神经网络主要在扩展层面工作,缺乏对抽象意义的理解。这导致AI系统在处理需要理解意图的问题时,例如“过河问题”,会犯一些非常愚蠢的错误。

🤝结合神经网络与传统AI:Marcus建议将神经网络与传统AI相结合,借鉴Daniel Kahneman的“系统一”和“系统二”理论,构建更完善的AI模型。这种混合方法有望在多个维度上超越当前的AI技术。

🎬“理解挑战”作为AI基准:Marcus提出了“理解挑战”,要求AI系统能够观看电影,并理解其深层含义,例如人物行为的原因、台词的幽默之处以及场景的讽刺意味。他认为,当AI能够可靠地完成这一挑战时,才能真正令人印象深刻。

In an age of breathless predictions and sky-high valuations, cognitive scientist Gary Marcus has emerged as one of the best-known skeptics of generative artificial intelligence (AI). In fact, he recently wrote a book about his concerns, Taming Silicon Valley, in which he made the case that “we are not on the best path right now, either technically or morally.” Marcus—who has spent his career examining both natural and artificial intelligence—explained his reasoning in a recent conversation with Leah Hoffmann.

You’ve written about neural networks in everything from your 1992 monograph on language acquisition to, most recently, your book Taming Silicon Valley. Your thoughts about how AI companies and policies fall short have been well covered in your U.S. Senate testimony and other outlets (including your own Substack). Let’s talk here about your technical criticisms.

Technically speaking, neural networks, as they are usually used, are function approximators, and Large Language Models (LLMs) are basically approximating the function of how humans use language. And they’re extremely good at that. But approximating a function is not the same thing as learning a function.

In 1998, I pointed out several examples of what people now call the problem of distribution shift. For instance, I trained the one-hidden-layer neural networks that were popular at the time the identity function, f(x)=X, on even numbers represented as binary digits, and I showed that these systems could generalize to some new even numbers. But if I tested them on odd numbers, they would systematically fail. So I made, roughly, a distinction between interpolation and extrapolation, and I concluded that these tools are good at interpolating functions, but they’re not very good at extrapolating functions.

And in your view, the multilayer neural networks we have now still do not address that issue.

In fact, there was a paper published in October by six Apple researchers basically showing the same thing. If something is in the training set or close to something in the training set, these systems work pretty well. But if it’s far enough away from the training set, they break down.

In philosophy, they make a distinction between intention and extension. The intention of something is basically the abstract meaning, like “even number.” The extension is a list of all the even numbers. And neural networks basically work at the extensional level, but they don’t work at the intentional level. They are not getting the abstract meaning of anything.

You’ve called attention to one way this distinction manifests in river-crossing problems, where generative AI systems propose solutions that resemble the right answer, but with absurdly illogical twists or random elements that were not present in the original question.

These models don’t really have a representation of what a man is, what a woman is, or what a boat is; as a result, they often make really boneheaded mistakes. And there are other consequences, like the fact that you can’t give them an instruction and expect them to reliably follow it. You can’t say, “Don’t lie,” or “don’t hallucinate,” or “don’t use copyrighted materials.” These systems are trained on copyrighted materials—they won’t be able to judge. You can’t do basic fact-checking. You also can’t follow principles like, “Don’t discriminate on the basis of race or age or sex,” because if LLMs are trained on real-world data, they tend to perpetuate past stereotypes rather than following abstract principles.

So you wind up with all of these technical problems, many of which spill over into the moral and ethical domain.

You’ve argued that to fix the moral and technical problems with AI, we need a new approach, not just more training data.

Generative AI only works for certain things. It works for pattern recognition, but it doesn’t work for the type of formal reasoning you need in chess. It doesn’t work for everyday formal reasoning about the world, and it doesn’t even reliably generate accurate summaries.

If you think about it abstractly, there’s a huge number of possible AI models, and we’re stuck in one corner. So one of my proposals is that we should consider integrating neural networks with classical AI. I make an analogy in my book to Daniel Kahneman’s System One and System Two. System One is fast, reflexive, and automatic—kind of like LLMs—while System Two is more deliberative reasoning, like classical AI. Our human mind combines both and gets results that are not perfect, but that are much better, in many dimensions, than current AI, so I think exploring that would be really a good idea. It won’t be sufficient for developing systems that can observe something and build a structured set of representations about how that thing works, but it might get us part of the way there.

At the time of this interview, several people in the field seem to agree that we’re hitting a period of diminishing returns with respect to LLMs.

That is a phrase that I coined in a 2022 essay called “Deep Learning is Hitting a Wall,” which was about why scaling wouldn’t get us to AGI (Artificial General Intelligence). And when I coined it, everybody dismissed me and said, “No, we’re not reaching diminishing returns. We have these scaling laws. We’ll just get more data.” But what people have observed in the last couple of months is that adding more data does not actually solve the core underlying problems on the technical side. The big companies that are doing big training runs are not getting the results they expected.

Do you think that will be enough to change the atmosphere and shift the industry’s focus?

I hope that the atmosphere will change. In fact, I know it will change, I just don’t know when. A lot of this is crowd psychology. DeepMind does hybrid AI. AlphaFold is a neurosymbolic system, and it just won the Nobel Prize. So there are some efforts, but for the time being, venture capitalists only want to invest in LLMs. There’s no oxygen left for anything else.

That said, different things could happen, maybe even by the time we go to print. The market might crash. If you can’t eliminate hallucinations, it limits your commercial potential. I think people are starting to see that, and if enough of them do, then it’s just a psychology thing. Maybe someone will come up with a new and better idea. At some point, they will. It could come tomorrow or it might take a decade or more.

People have proposed a number of different benchmarks for evaluating progress in AI. What do you make of them?

Here’s a benchmark I proposed in 2014 that I think is still beyond current AI. I call it the comprehension challenge. The idea is that an AI system should be able to watch a movie, build a cognitive model of what is going on, and answer questions. Why did the characters do this? Why is that line funny? What’s the irony in this scene?

Right now, LLMs might get it sort of right some of the time, but nowhere near as reliably as the average person. If a character says at the end of the movie, “I see dead people,” everybody in the cinema has this “Oh, my god” moment. Everybody in the cinema has followed the world of the movie and suddenly realized that a principle they thought was true does not apply. When we have AI that can do that with new movies that are not in the training data, I’ll be genuinely impressed.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 神经网络 认知科学 技术局限性 AI基准
相关文章