少点错误 2024年07月31日
François Chollet on the limitations of LLMs in reasoning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

François Chollet探讨了LLMs在推理方面的限制,认为其在某些方面表现不佳,如解决新问题的能力有限,且需要大量数据训练。

🧠LLMs在解决已知任务时,通过记忆和检索程序模板来进行,这是它们的主要工作方式,但这种方式对于表示离散符号程序是次优的,例如在数字加法上表现不佳,需要大量训练且准确率有限。

💡真正有价值的推理是能够合成新程序以解决从未见过的任务,但LLMs自身无法做到这一点,不过可将其纳入能够进行这种推理的程序搜索过程中。

📚LLMs因其局限于检索记忆中的程序,是静态的程序存储,所以需要大量数据训练来使其有一定用处,而人类的智能方式与之不同。

Published on July 30, 2024 8:04 PM GMT

François Chollet, the creator of the Keras deep learning library, recently shared his thoughts on the limitations of LLMs in reasoning. I find his argument quite convincing and am interested to hear if anyone has a different take.

The question of whether LLMs can reason is, in many ways, the wrong question. The more interesting question is whether they are limited to memorization / interpolative retrieval, or whether they can adapt to novelty beyond what they know. (They can't, at least until you start doing active inference, or using them in a search loop, etc.) 

There are two distinct things you can call "reasoning", and no benchmark aside from ARC-AGI makes any attempt to distinguish between the two.

First, there is memorizing & retrieving program templates to tackle known tasks, such as "solve ax+b=c" -- you probably memorized the "algorithm" for finding x when you were in school. LLMs can do this! In fact, this is most of what they do. However, they are notoriously bad at it, because their memorized programs are vector functions fitted to training data, that generalize via interpolation. This is a very suboptimal approach for representing any kind of discrete symbolic program. This is why LLMs on their own still struggle with digit addition, for instance -- they need to be trained on millions of examples of digit addition, but they only achieve ~70% accuracy on new numbers.

This way of doing "reasoning" is not fundamentally different from purely memorizing the answers to a set of questions (e.g. 3x+5=2, 2x+3=6, etc.) -- it's just a higher order version of the same. It's still memorization and retrieval -- applied to templates rather than pointwise answers.

The other way you can define reasoning is as the ability to synthesize new programs (from existing parts) in order to solve tasks you've never seen before. Like, solving ax+b=c without having ever learned to do it, while only knowing about addition, subtraction, multiplication and division. That's how you can adapt to novelty. LLMs cannot do this, at least not on their own. They can however be incorporated into a program search process capable of this kind of reasoning. 

This second definition is by far the more valuable form of reasoning. This is the difference between the smart kids in the back of the class that aren't paying attention but ace tests by improvisation, and the studious kids that spend their time doing homework and get medium-good grades, but are actually complete idiots that can't deviate one bit from what they've memorized. Which one would you hire? 

LLMs cannot do this because they are very much limited to retrieval of memorized programs. They're static program stores. However, can display some amount of adaptability, because not only are the stored programs capable of generalization via interpolation, the program store itself is interpolative: you can interpolate between programs, or otherwise "move around" in continuous program space. But this only yields local generalization, not any real ability to make sense of new situations. 

This is why LLMs need to be trained on enormous amounts of data: the only way to make them somewhat useful is to expose them to a dense sampling of absolutely everything there is to know and everything there is to do. Humans don't work like this -- even the really dumb ones are still vastly more intelligent than LLMs, despite having far less knowledge.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs 推理局限性 程序合成
相关文章