MarkTechPost@AI 2024年07月09日
This AI Research from Ohio State University and CMU Discusses Implicit Reasoning in Transformers And Achieving Generalization Through Grokking
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

来自俄亥俄州立大学和卡内基梅隆大学的研究人员研究了 Transformer 深度学习模型是否可以学习对参数化信息进行隐式推理。研究重点关注两种推理类型:比较和组合。研究发现,虽然 Transformer 能够学习隐式推理,但只有通过称为“Grokking”的过程,它们才能可靠地做到这一点。Grokking 意味着训练远远超过过拟合点,在此过程中,模型除了记忆训练数据之外,还会更多地了解底层模式。

🤔 **Grokking 机制:**研究团队发现,随着时间的推移,模型中一个名为“泛化电路”的组件如何出现和发展。该电路在泛化数据而不是仅仅记忆数据方面的有效性对于模型执行隐式推理的能力至关重要。

🧠 **系统性和电路配置:**研究团队发现泛化电路的配置与模型进行系统泛化能力之间存在密切关系。模型的推理能力在很大程度上取决于原子知识和规则在其内部的排列方式和可访问性。

💡 **提升 Transformer 的推理能力:**研究表明,Transformer 中的隐式推理在很大程度上取决于训练过程的设置方式以及训练数据的组织方式。研究结果还表明,通过包含促进跨层知识共享的方法可以改进 Transformer 架构,这可以增强模型的推理能力。

🤖 **参数化记忆的潜力:**研究还表明,参数化记忆,即模型在参数内存储和应用知识的能力,在复杂推理任务中表现良好。即使增强或提示了它们的检索过程,最先进的模型(例如 GPT-4-Turbo 和 Gemini-1.5-Pro)依赖于非参数化记忆,在具有较大搜索空间的特别困难的推理任务中表现不佳。另一方面,一个完全 Grokked 的使用参数化记忆的 Transformer 能够达到几乎完美的准确率。这表明参数化记忆在使语言模型能够进行复杂推理方面具有巨大潜力。

💻 **研究结论:**研究表明,Transformer 能够学习隐式推理,但需要通过 Grokking 过程,并需要仔细考虑模型架构和训练策略。该研究还强调了参数化记忆在促进复杂推理方面的潜力。

Large Language Models (LLMs) with parametric memory of rules and knowledge have shown limitations in implicit reasoning. Research has shown that these models, even more complex ones like GPT-4, have trouble applying and integrating internalized facts reliably. For instance, even when they are aware of the entities in question, they frequently make inaccurate comparisons of their properties. Implicit reasoning deficits have important consequences, such as making it harder to induce structured and condensed representations of rules and facts. This makes it difficult to propagate changes and results in redundant knowledge storage, ultimately impairing the model’s capacity to systematically generalize knowledge.

In recent research, researchers from Ohio State University and Carnegie Mellon University have studied whether deep learning models such as transformers can learn to reason implicitly over parametric information. The research focuses on two main categories of reasoning: comparison, which assesses the similarities or differences between items, and composition, which combines several pieces of information.

The team has found that while transformers are able to learn implicit reasoning, it is only through a process called grokking that they are able to do so robustly. Grokking is the term for training that is continued much past the point of overfitting, at which the model learns more about the underlying patterns in addition to memorizing training data. 

Different types of reasoning have different effects on how far transformers can apply this logic. Transformers specifically struggle to generalize effectively for composition tasks when confronted with out-of-distribution examples (data that deviate greatly from the training data), but they perform well for comparison tasks.

The team carried out an in-depth evaluation of the internal workings of the models during training to ascertain why this occurred. The research has produced a number of important findings, which are as follows.

    The Mechanism of Grokking: The team found out how the generalizing circuit, which is a component of the model that adapts learned rules to unique circumstances, emerges and develops over time. The effectiveness of this circuit in generalizing data as opposed to just memorization is essential to the model’s ability to perform implicit reasoning.
    Systematicity and Circuit Configuration: The team discovered a close relationship between the generalizing circuit’s configuration and the model’s capacity for systematic generalization. The reasoning powers of the model are largely determined by how atomic knowledge and rules are arranged and accessible within it.

According to the research, implicit reasoning in transformers is largely dependent on how the training process is set up and how the training data is organized. The findings have also suggested that the transformer architecture can be improved by including methods that promote cross-layer knowledge sharing, which could strengthen the reasoning capabilities of the model.

The study has also demonstrated that parametric memory, which is the model’s capacity to store and apply knowledge within its parameters, works well for intricate reasoning tasks. State-of-the-art models such as GPT-4-Turbo and Gemini-1.5-Pro, which rely on non-parametric memory, did not perform well for a particularly difficult reasoning task with a large search space, no matter how their retrieval processes were augmented or prompted. 

On the other hand, a completely grokked transformer that used parametric memory was able to reach almost flawless accuracy. This demonstrates how parametric memory has a great deal of promise in enabling sophisticated reasoning in language models.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our 46k+ ML SubReddit, 26k+ AI Newsletter, Telegram Channel, and LinkedIn Group.

If You are interested in a promotional partnership (content/ad/newsletter), please fill out this form.

The post This AI Research from Ohio State University and CMU Discusses Implicit Reasoning in Transformers And Achieving Generalization Through Grokking appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 深度学习 Transformer 隐式推理 Grokking
相关文章