Foundations of Interpretable Models

cs.AI updates on arXiv.org 23小时前

Foundations of Interpretable Models

文章提出当前可解释性定义不具操作性，并提出了一个通用、简单且包含现有非正式概念的新的可解释性定义，为设计可解释模型提供指导，并引入了首个开源库支持可解释数据结构。

arXiv:2508.00545v1 Announce Type: cross Abstract: We argue that existing definitions of interpretability are not actionable in that they fail to inform users about general, sound, and robust interpretable model design. This makes current interpretability research fundamentally ill-posed. To address this issue, we propose a definition of interpretability that is general, simple, and subsumes existing informal notions within the interpretable AI community. We show that our definition is actionable, as it directly reveals the foundational properties, underlying assumptions, principles, data structures, and architectural features necessary for designing interpretable models. Building on this, we propose a general blueprint for designing interpretable models and introduce the first open-sourced library with native support for interpretable data structures and processes.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

可解释性模型设计开源库

相关文章

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Learning Transformer Programs with Dan Friedman - #667

Transformers On Large-Scale Graphs with Bayan Bruss - #641

Studying Machine Intelligence with Been Kim - #571

Trends in Natural Language Processing with Nasrin Mostafazadeh - #337

Real world model explainability with Rayid Ghani - TWiML Talk #283

Fairness in Machine Learning with Hanna Wallach - TWiML Talk #232

Evaluating Model Explainability Methods with Sara Hooker - TWiML Talk #189

Infrastructure for Autonomous Vehicles with Missy Cummings - TWiML Talk #128

Carlos Guestrin - Explaining the Predictions of Machine Learning Models - TWiML Talk #7