MarkTechPost@AI 2024年07月02日
Researchers at Princeton University Proposes Edge Pruning: An Effective and Scalable Method for Automated Circuit Finding
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

普林斯顿大学研究者提出边剪枝方法用于语言模型电路发现,该方法解决了现有方法的诸多问题,在多项任务中表现出色,但也存在一些挑战。

🧐语言模型日益复杂,其内部运作难以解释,研究者通过机制可解释性来解决,边剪枝方法是其中的新尝试。它将电路发现视为优化问题,采用基于梯度的剪枝技术,适应电路发现而非模型压缩,专注于组件间的边而非组件本身的修剪。

🌟边剪枝方法用解缠版本替代传统Transformer残差流,保留所有先前激活的列表,并引入边缘掩码决定读取哪些组件。该方法利用离散优化技术,如L0正则化,优化边缘掩码以产生稀疏电路,通过用损坏示例的反事实激活替换缺失边缘,在发现最小电路的同时保持模型功能。

💪边剪枝方法在性能上优于现有方法,如ACDC和EAP,在复杂任务中表现尤为突出。在四个标准电路发现任务的测试中,它能在GPT - 2 Small中找到更符合完整模型且任务性能更好的电路,在多模板间接对象识别等复杂任务中优势明显,还能有效扩展到更大的数据集。

Language models have become increasingly complex, making it challenging to interpret their inner workings. Researchers are attempting to solve this problem through mechanistic interpretability, which involves identifying and analyzing circuits – sparse computational subgraphs that capture specific aspects of a model’s behavior. 

Current methodologies for discovering these circuits face significant challenges. Automated methods like ACDC and EAP have practical limitations, relying on inefficient search algorithms or inaccurate approximations. ACDC’s greedy search approach is computationally expensive and doesn’t scale well to large datasets or billion-parameter models. EAP, while faster, sacrifices faithfulness to the full model by using gradient-based linear approximations. These challenges hinder the progress of mechanistic interpretability and limit the ability to understand the inner workings of complex language models.

Researchers from Princeton Language and Intelligence (PLI), Princeton University present a unique method, Edge Pruning which offers a novel approach to circuit discovery in language models, framing it as an optimization problem tackled via gradient-based pruning. This method adapts pruning techniques for circuit discovery rather than model compression, focusing on pruning edges between components instead of the components themselves. 

Edge Pruning replaces the traditional Transformer residual stream with a disentangled version, retaining a list of all previous activations. This innovation allows for the introduction of edge masks that determine which components to read from. The approach utilizes discrete optimization techniques, such as L0 regularization, to optimize these edge masks and produce sparse circuits. By replacing missing edges with counterfactual activations from corrupted examples, Edge Pruning maintains model functionality while discovering minimal circuits. This method aims to overcome the limitations of previous approaches by balancing efficiency, scalability, and faithfulness to the full model in identifying circuits within complex language models.

Edge Pruning demonstrates superior performance compared to existing methods like ACDC and EAP, particularly on complex tasks. In tests on four standard circuit-finding tasks, Edge Pruning consistently finds circuits in GPT-2 Small that are more faithful to the full model and exhibit better task performance. The method’s advantage is especially pronounced on complex tasks like multi-template Indirect Object Identification (IOI), where it discovers circuits with 2.65 times fewer edges while maintaining faithfulness to model outputs. Edge Pruning also scales effectively to larger datasets, outperforming other methods in speed and performance on a 100K-example version of IOI. Also, it perfectly recovers ground-truth circuits in two Transformers compiled by Tracr, further validating its effectiveness.

Edge Pruning introduces a unique approach to circuit discovery in language models by framing it as an optimization problem tackled through gradient-based pruning of edges between components. This method demonstrates superior performance and faithfulness compared to existing techniques, especially on complex tasks. It scales effectively to large datasets and models, as evidenced by its application to CodeLlama-13B. While Edge Pruning shows promise in advancing mechanistic interpretability, challenges remain, such as memory requirements and the need for further automation in interpreting discovered circuits. Despite these limitations, Edge Pruning represents a significant step forward in understanding and explaining large foundation models, contributing to their safe development and deployment.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

The post Researchers at Princeton University Proposes Edge Pruning: An Effective and Scalable Method for Automated Circuit Finding appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

边剪枝 语言模型 电路发现 机制可解释性
相关文章