少点错误 2024年08月12日
[LDSL#4] Root cause analysis versus effect size estimation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨因果推断中传统理论与根因分析的关系,指出人们对因果的某些直觉不符合效应大小估计,介绍了线性扩散的稀疏对数正态模型,认为结果通常是多种变量的混合,还讨论了根因分析是否是效应大小估计的特殊情况及其中的缺陷,最后提到一些根因分析的启发式方法。

🎯传统因果推断理论中,因果性被建模为函数决定关系,研究重点是量化一个变量对另一个变量的效应大小。但人们对因果的某些直觉难以用效应大小估计来解释,存在复杂的多因性。

💡线性扩散的稀疏对数正态模型表明,结果通常是多种不同变量的混合,根因分析旨在描述结果如何分解为这些变量,以更好地理解情况,但可能只得出少数因素,因为多数变量的影响往往可忽略。

❌根因分析并非效应大小估计的特殊情况,存在两个大缺陷:一是估计函数f时使用的特定本体可能不适合根因;二是通过估计f来确定根因需要极其详细的系统模型,这可能很难做到。

✨文章提到一些根因分析的启发式方法,如查看系统中最极端的情况,从X逆向追溯因果关系,但这需要特殊的‘会计数据’,这种数据需特别全面和定量。

Published on August 11, 2024 4:12 PM GMT

Followup to: Information-orientation is in tension with magnitude-orientation. This post is also available on my Substack.

In the conventional theory of causal inference, such as Rubin’s potential outcomes model or Pearl’s DAG approach, causality is modelled as a relationship of functional determination, X := f(Y). The question of interest becomes to study the properties of f, especially the difference in f across different values of Y. I would call this “effect size estimation”, because the goal is to give quantify the magnitude of an effect of one variable on another.

But as I mentioned in my post on conundrums, people seem to have some intuitions about causality that don’t fit well into effect size estimation, most notably in wanting “the” cause of some outcome when really there’s often thought to be complex polycausality.

Linear diffusion of sparse lognormals provides an answer: an outcome is typically a mixture of many different variables, X := Σi Yi, and one may desire an account which describes how the outcome breaks down into these variables to better understand what is going on. This is “root cause analysis”, and it yields one or a small number of factors because most of the variables tend to be negligible in magnitude. (If the root cause analysis yields a large number of factors, that is evidence that the RCA was framed poorly.)

Is root cause analysis a special case of effect size estimation?

If you know X, Y, and f, then it seems you can do root cause analysis automatically by setting each of the Y’s to zero, seeing how it influences X, and then reporting the Y’s in descending order of influence. Thus, root cause analysis ought to be a special-case of effect size estimation, right?

There are two big flaws with this view:

You can try to use statistical effect size estimation for root cause analysis. However, doing so creates an exponentially strong bias in favor of common things over important things, so it’s unlikely to work unless you can somehow absorb all the information in the system.

Heuristics for direct root cause analysis

I don’t think I have a complete theory of root cause analysis yet, but I know of some general heuristics for root cause analysis which don’t require comprehensive effect size estimation.

These both require a special sort of data, which I like to think of as “accounting data”. It differs from statistical data in that it needs to be especially comprehensive and quantitative. It would often be hard to perform this type of inference using a small random sample of the system, at least unless the root cause affects the system extraordinarily broadly.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

因果推断 根因分析 效应大小估计 启发式方法
相关文章