The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter

arXiv:2411.04696v4 Announce Type: replace-cross Abstract: Learning correlations from data forms the foundation of today's machine learning (ML) and artificial intelligence (AI) research. While contemporary methods enable the automatic discovery of complex patterns, they are prone to failure when unintended correlations are captured. This vulnerability has spurred a growing interest in interrogating spuriousness, which is often seen as a threat to model performance, fairness, and robustness. In this article, we trace departures from the conventional statistical definition of spuriousness -- which denotes a non-causal relationship arising from coincidence or confounding -- to examine how its meaning is negotiated in ML research. Rather than relying solely on formal definitions, researchers assess spuriousness through what we call pragmatic frames: judgments based on what a correlation does in practice -- how it affects model behavior, supports or impedes task performance, or aligns with broader normative goals. Drawing on a broad survey of ML literature, we identify four such frames: relevance ("Models should use correlations that are relevant to the task"), generalizability ("Models should use correlations that generalize to unseen data"), human-likeness ("Models should use correlations that a human would use to perform the same task"), and harmfulness ("Models should use correlations that are not socially or ethically harmful"). These representations reveal that correlation desirability is not a fixed statistical property but a situated judgment informed by technical, epistemic, and ethical considerations. By examining how a foundational ML conundrum is problematized in research literature, we contribute to broader conversations on the contingent practices through which technical concepts like spuriousness are defined and operationalized.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签