热点
关于我们
xx
xx
"
稀疏自编码器
" 相关文章
When Truthful Representations Flip Under Deceptive Instructions?
cs.AI updates on arXiv.org
2025-07-31T04:47:51.000000Z
Explaining GPT-2-Small Forward Passes with Edge-Level Autoencoder Circuits
少点错误
2025-07-22T20:37:39.000000Z
From Messy Shelves to Master Librarians: Toy-Model Exploration of Block-Diagonal Geometry in LM Activations
少点错误
2025-07-19T19:33:08.000000Z
L0 is not a neutral hyperparameter
少点错误
2025-07-19T13:57:32.000000Z
Teach Old SAEs New Domain Tricks with Boosting
cs.AI updates on arXiv.org
2025-07-18T04:14:10.000000Z
大模型知识回路的形成与SAE在可解释中的潜力丨周六直播·大模型可解释性读书会
集智俱乐部
2025-07-18T04:12:42.000000Z
Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
cs.AI updates on arXiv.org
2025-07-17T04:14:50.000000Z
大模型知识回路的形成与SAE在可解释中的潜力丨周四直播·大模型可解释性读书会
集智俱乐部
2025-07-16T16:31:22.000000Z
大模型知识回路的形成与SAE在可解释中的潜力丨周四直播·大模型可解释性读书会
集智俱乐部
2025-07-16T01:43:43.000000Z
Direct Preference Optimization Using Sparse Feature-Level Constraints
cs.AI updates on arXiv.org
2025-07-04T04:08:35.000000Z
Feature Integration Spaces: Joint Training Reveals Dual Encoding in Neural Network Representations
cs.AI updates on arXiv.org
2025-07-02T04:03:51.000000Z
苦研10年无果,千万经费打水漂,AI黑箱依然无解,谷歌撕破脸
36kr-科技
2025-05-19T03:47:28.000000Z
苦研10年无果,千万经费打水漂!AI黑箱依然无解,谷歌撕破脸
智源社区
2025-05-18T04:34:10.000000Z
苦研10年无果,千万经费打水漂!AI黑箱依然无解,谷歌撕破脸
新智元
2025-05-17T06:17:26.000000Z
Interpretable Fine Tuning Research Update and Working Prototype
少点错误
2025-05-16T03:52:30.000000Z
Negative Results on Group SAEs
少点错误
2025-05-06T21:57:27.000000Z
This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability
MarkTechPost@AI
2025-04-05T05:47:58.000000Z
Takeaways From Our Recent Work on SAE Probing
少点错误
2025-03-03T19:51:58.000000Z
Enhancing Instruction Tuning in LLMs: A Diversity-Aware Data Selection Strategy Using Sparse Autoencoders
MarkTechPost@AI
2025-02-25T17:48:40.000000Z
Topological Data Analysis and Mechanistic Interpretability
少点错误
2025-02-24T20:30:05.000000Z