cs.AI updates on arXiv.org 07月23日 12:03
On the transferability of Sparse Autoencoders for interpreting compressed models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了稀疏自动编码器(SAEs)在原始和压缩模型中的差异,发现SAEs能解释压缩模型,但性能略逊于在压缩模型上训练的SAEs。简单剪枝原始SAEs也能达到类似效果,降低SAEs的训练成本。

arXiv:2507.15977v1 Announce Type: cross Abstract: Modern LLMs face inference efficiency challenges due to their scale. To address this, many compression methods have been proposed, such as pruning and quantization. However, the effect of compression on a model's interpretability remains elusive. While several model interpretation approaches exist, such as circuit discovery, Sparse Autoencoders (SAEs) have proven particularly effective in decomposing a model's activation space into its feature basis. In this work, we explore the differences in SAEs for the original and compressed models. We find that SAEs trained on the original model can interpret the compressed model albeit with slight performance degradation compared to the trained SAE on the compressed model. Furthermore, simply pruning the original SAE itself achieves performance comparable to training a new SAE on the pruned model. This finding enables us to mitigate the extensive training costs of SAEs.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

稀疏自动编码器 模型压缩 解释性 性能评估
相关文章