热点
关于我们
xx
xx
"
模型可解释性
" 相关文章
生成式人工智能的算法伦理难点分析与探索
专家观察
2025-04-07T13:14:28.000000Z
Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
MarkTechPost@AI
2025-04-06T05:30:28.000000Z
This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability
MarkTechPost@AI
2025-04-05T05:47:58.000000Z
AI日报 - 2025年4月2日
掘金 人工智能
2025-04-01T15:42:47.000000Z
Anthropic CEO Dario Amodei warns of ‘race’ to understand AI as it becomes more powerful
TechCrunch News
2025-02-12T17:45:56.000000Z
Visualizing Interpretability
少点错误
2025-02-03T22:36:46.000000Z
Training Data Attribution (TDA): Examining Its Adoption & Use Cases
少点错误
2025-01-22T15:44:58.000000Z
AIhub monthly digest: November 2024 – dynamic faceted search, the kidney exchange problem, and AfriClimate AI
ΑΙhub
2024-11-29T10:48:05.000000Z
AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory
少点错误
2024-11-27T06:37:25.000000Z
Using Uncertainty to Interpret your Model
无
2024-11-26T06:35:35.000000Z
做出最好大模型的 CEO,不认为 Scaling Law 撞墙了
Founder Park
2024-11-22T16:01:02.000000Z
How to Explain the Prediction of a Machine Learning Model?
Lil'Log
2024-11-09T05:43:41.000000Z
SAEs are highly dataset dependent: a case study on the refusal direction
少点错误
2024-11-07T05:32:18.000000Z
Refined Local Learning Coefficients (rLLCs): A Novel Machine Learning Approach to Understanding the Development of Attention Heads in Transformers
MarkTechPost@AI
2024-10-21T19:50:50.000000Z
Exploring Input Space Mode Connectivity: Insights into Adversarial Detection and Deep Neural Network Interpretability
MarkTechPost@AI
2024-09-22T21:05:32.000000Z
Let's Decrypt Dot by Dot: Decoding Hidden Computation in Transformer Language Models
少点错误
2024-08-24T16:37:23.000000Z
链式思考如何激发大模型算术推理能力?科学家从神经元激活角度给出答案
MIT 科技评论 - 本周热榜
2024-08-05T00:46:48.000000Z
Gemma 2-2B Released: A 2.6 Billion Parameter Model Offering Advanced Text Generation, On-Device Deployment, and Enhanced Safety Features
MarkTechPost@AI
2024-08-01T08:19:27.000000Z
Legal and Policy Implications of Model Interpretability with Solon Barocas - TWiML Talk #219
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
2024-05-12T04:32:33.000000Z
AI for High-Stakes Decision Making with Hima Lakkaraju - #387
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
2024-05-12T03:32:26.000000Z