热点
"模型解释性" 相关文章
Explaining GPT-2-Small Forward Passes with Edge-Level Autoencoder Circuits
少点错误 2025-07-22T20:37:39.000000Z
Simply reverse engineering gpt2-small (Layer 0, Part 1: Attention)
少点错误 2025-07-22T15:04:02.000000Z
Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs
少点错误 2025-06-22T18:17:34.000000Z
Can We Really Trust AI’s Chain-of-Thought Reasoning?
Unite.AI 2025-05-24T16:52:33.000000Z
Some OthelloGPT Circuits
少点错误 2025-04-15T21:37:45.000000Z
Enumerating objects a model "knows" using entity-detection features.
少点错误 2025-03-30T20:47:52.000000Z
Learning Multi-Level Features with Matryoshka SAEs
少点错误 2024-12-19T16:01:41.000000Z
The ‘strong’ feature hypothesis could be wrong
少点错误 2024-08-02T14:36:30.000000Z
通过AI寻找科学真理,距离我们还有多远?对话深度原理创始人、新神经网络架构KAN作者|DeepTalk播客更新
MIT 科技评论 - 本周热榜 2024-07-14T16:01:53.000000Z
Interpreting Preference Models w/ Sparse Autoencoders
少点错误 2024-07-02T02:05:14.000000Z