少点错误 01月14日
Chance is in the Map, not the Territory
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了概率的本质,挑战了传统认为概率是客观存在的观点。文章基于德·菲内蒂的理论,认为概率并非是世界的固有属性,而是我们主观信念的一种表现形式。通过交换性的概念,我们可以在不假设真实概率存在的情况下,有效地进行概率推理。文章通过摸彩球、天气预报和临床试验等例子,阐述了如何利用交换性信念来理解和应用概率,强调了概率是我们如何认识和预测世界的一种工具,而非世界本身的属性。

🔄 交换性信念是理解概率的关键:当我们的信念具有交换性,即事件发生的顺序不影响我们的预期时,我们就可以像存在一个潜在的概率一样进行推理,即使我们不假设这个概率真实存在。

🌡️ 天气预报:天气预报中“70%的降雨概率”并非测量某种客观存在的“降雨概率”,而是表达一种关于类似天气条件下历史数据的信念模式,这种模式在预测中被证明是可靠的。

🧪 临床试验:临床试验中,医生说某种疗法有“60%的成功率”,并非测量药物的固定属性,而是总结了一个学习过程,从对患者结果的交换性信念开始,最终将概率集中在0.6附近。这是一种对患者结果的预测,而不是发现药物的真实属性。

🤖 机器学习:在机器学习中,假设数据是独立同分布的(i.i.d.)可以被视为一种表达交换性信念的方式,而不是对世界的真实描述。通过这种方式,我们可以有效地使用机器学习模型进行预测,而无需假设存在一个不变的概率分布。

Published on January 13, 2025 7:17 PM GMT

"There's a 70% chance of rain tomorrow," says the weather app on your phone. "There’s a 30% chance my flight will be delayed," posts a colleague on Slack. Scientific theories also include chances: “There’s a 50% chance of observing an electron with spin up,” or (less fundamental) “This is a fair die — the probability of it landing on 2 is one in six.”

We constantly talk about chances and probabilities, treating them as features of the world that we can discover and disagree about. And it seems you can be objectively wrong about the chances. The probability of a fair die landing on 2 REALLY is one in six, it seems, even if everybody in the world thought otherwise. But what exactly are these things called “chances”?

Readers on LessWrong are very familiar with the idea that many probabilities are best thought of as subjective degrees of belief. This idea comes from a few core people, including Bruno de Finetti. For de Finetti, probability was in the map, not the territory.

But perhaps this doesn’t capture how we talk about chance. For example, our degrees of belief need not equal the chances, if we are uncertain about the chances.  But then what are these chances themselves? If we are uncertain about the bias of a coin, or the true underlying distribution in some environment, then we can use our uncertainty over those chances to generate our subjective probabilities over what we’ll observe.[1] But then we have these other probabilities — chances, distributions, propensities, etc. — to which we are assigning probabilities. What are these things?

Here we’ll show how we can keep everything useful about chance-based reasoning while dropping some problematic metaphysical assumptions. The key insight comes from work by, once again, de Finetti. De Finetti’s approach has been fleshed out in detail by Brian Skyrms. We’ll take a broadly Skyrmsian perspective here, in particular as given in his book Pragmatics and Empiricism. The core upshot is that we don't need to believe in chances as real things "out there" in the world to use chance effectively. Instead, we can understand chance through patterns and symmetries in our beliefs.

Two Ways to Deal with Chance

When philosophers and scientists have tried to make sense of chance, they've typically taken one of two approaches. The first tries to tell us what chance IS – maybe it's just long-run frequency, or maybe it's some kind of physical property like mass or charge. Or maybe it is some kind of lossy compression of information. The second approach, which we'll explore here, asks a different question: what role does chance play in our reasoning, and can we fulfill that role without assuming chances exist?

Let's look (briefly) at why the first approach is problematic. Frequentists say chance is just long-run frequency:[2] The chance of heads is 1/2 because in the long run, about half the flips will be heads. But this has issues. What counts as "long run"? What if we never actually get to infinity? And how do we handle one-off events that can't be repeated?[3]

Others say chance is a physical property – a "propensity" of systems to produce certain outcomes. But this feels suspiciously like adding a mysterious force to our physics.[4] When we look closely at physical systems (leaving quantum mechanics aside for now), they often seem deterministic: if you could flip a coin exactly the same way twice, it would land the same way both times.

The Key Insight: Symmetries in Our Beliefs

To see how this second approach works in a more controlled setting, imagine an urn containing red and blue marbles. Before drawing any marbles, you have certain beliefs about what you'll observe. You might think the sequence "red, blue, red" is just as likely as "blue, red, red"—the order doesn't matter, but you can learn from the observed frequencies of red and blue draws.

This symmetry in your beliefs—that the order doesn't matter—is called exchangeability. As you observe more draws, updating your beliefs each time, you develop increasingly refined expectations about future draws. The key insight is that you're not discovering some "true chance" hidden in the urn. Instead, de Finetti showed that when your beliefs have this exchangeable structure, you'll naturally reason as if there were underlying chances you were learning about in a Bayesian way—even though we never needed to assume they exist.[5]

This is different from just saying the draws are independent. If they were truly independent, seeing a hundred red marbles in a row wouldn't tell you anything about the next draw. But this isn't how we actually reason! Seeing mostly red marbles leads us to expect more red draws in the future. Exchangeability captures this intuition: we can learn from data while maintaining certain symmetries in our beliefs.

The Magic of de Finetti

De Finetti showed something remarkable: if your beliefs about a sequence of events are exchangeable, then mathematically, you must act exactly as if you believed there was some unknown chance governing those events. In other words, exchangeable beliefs can always be represented as if you had beliefs about chances – even though we never assumed chances existed!

For Technical Readers: De Finetti's theorem shows that any exchangeable probability distribution over infinite sequences can be represented as a mixture of i.i.d. distributions. Furthermore, as one observes events in the sequence and updates one’s probability over events via Bayes’ rule, this corresponds exactly to updating one’s distribution over chance distributions via Bayes’ rule, and then using that distribution over chances to generate the probability of the next event. This means you can treat these events as if there's an unknown parameter (the "chance")—even though we never assumed such a parameter exists.

Let's see how this works in practice. When a doctor says a treatment has a "60% chance of success", traditionally we might think they're describing some real, physical property of the treatment. But in the de Finetti view, they're expressing exchangeable beliefs about patient outcomes—beliefs that happen to be mathematically equivalent to uncertainty about some "true" chance. The difference? We don't need to posit any mysterious chance properties. In this situation, since the doctor says it is 60%, she has probably observed enough outcomes (or reports of outcomes) that her posterior in the chance representation is tightly concentrated near 0.6.

De Finetti in Practice

This perspective transforms how we think about evidence and prediction across many domains:

1. Weather Forecasting

When your weather app says "70% chance of rain," it's not measuring some metaphysical "rain chance" property. It's expressing a pattern of beliefs about similar weather conditions that have proven reliable for prediction. Just like in the urn or medical examples, each new bit of data refines the forecast, and the weather model used by the app updates its probability estimates accordingly. This is true even though we sometimes talk about weather as being chaotic, or unpredictable. That is a statement about us, about our map, not the territory.[6]

2. Clinical Trials

This same pattern of learning applies in medical trials—though the stakes are far higher than drawing marbles. When a doctor says a treatment has a "60% chance of success" they're not measuring some fixed property of the drug. Instead, they're summarizing a learning process that starts with exchangeable beliefs about patient outcomes, whose representation as a mixture over chances ended up concentrating around 0.6.

Think of how researchers approach a new treatment. Before any trials, they treat each future patient's potential outcome as exchangeable—so "success, failure, success" is considered no more or less likely than "failure, success, success." As they observe real outcomes, each success or failure refines their model of the treatment's effectiveness, pushing their estimated success rate up or down accordingly. Just like with the urn, they're not discovering a true success rate hidden in the drug; they're building and refining a predictive model.

Crucially, this is different from treating outcomes as independent. If patient outcomes were truly independent, for the researchers, then seeing the treatment work in a hundred patients wouldn't affect their expectations for the hundred-and-first. But that's not how clinical knowledge works—consistent success makes doctors more confident in recommending the treatment. In other words, they're updating their map of the world, not uncovering a territory fact about the drug.

This exchangeable approach to patient outcomes captures how we actually learn from clinical data while maintaining certain symmetries in our beliefs—giving us all the practical benefits of "chances" without positing them as objective properties in the world.[7]

3. Machine Learning

When we train models on data, we often assume that the data points are “i.i.d.” (independent and identically distributed). From a de Finetti perspective, this i.i.d. assumption can be seen as an expression of exchangeable beliefs rather than a literal statement about the world. If you start with an exchangeable prior—meaning you assign the same probability to any permutation of your data—then de Finetti’s Representation Theorem says you can treat those observations as if they were generated i.i.d. conditional on some unknown parameter. In other words, you don’t need reality to be i.i.d.; you simply need to structure your beliefs in a way that allows an “as if” i.i.d. interpretation.

This means that when an ML practitioner says, “Assume the data is i.i.d.,” they’re effectively saying, “I have symmetrical (exchangeable) beliefs about the data-generating process.” As new data arrives, you update your posterior on an unknown parameter—much like the urn or medical examples—without ever needing to claim there’s a literal, unchanging probability distribution out there in the territory. Instead, you’ve adopted a coherent, Bayesian viewpoint that models the data as i.i.d. from your perspective, which is enough to proceed with standard inference and learning techniques from statistics and machine learning.

Furthermore, the de Finetti perspective might help shed light on what is going on inside transformers. Some initial attempts have been made to do this rigorously, though we haven’t worked carefully through them, so we can’t ourselves yet fully endorse them. In general, the de Finetti approach seems to vindicate the intuition that a system that is trained to predict observable variables/events might use a latent variable approach to do so, which of course we see empirically in many ways. Furthermore, it might suggest failure modes of AI systems. Just as humans have reified chances in certain ways, so too might AI systems reify certain latents. This is speculative, and we don’t want the scope of this post to bloat too much, but we it think deserves some thought.

We also suspect that there are connection to Wentworth and Lorell’s Natural Latents and how they hope to apply it to AI, but looking at the connections in a serious way should be a separate post.

Why This Matters

This approach aligns perfectly with the rationalist emphasis on "the map is not the territory." Like latitude and longitude, chances are helpful coordinates on our mental map, not fundamental properties of reality. When we say there's a 70% chance of rain, we're not making claims about mysterious properties in the world. Instead, we're expressing beliefs that have certain symmetries, beliefs that let us reason effectively about patterns we observe.

This perspective transforms how we think about statistical inference. When a scientist estimates a parameter or tests a hypothesis, they often talk about finding the "true probability" or "real chance." But now we can see this differently: they're working with beliefs that have certain symmetries, using the mathematical machinery of chance without needing to believe in chances as real things.

Common Objections and Clarifications

"But surely," you might think, "when we flip a fair coin, there really IS a 50% chance of heads!" The pragmatic response is subtle: we're not saying chances don't exist (though the three of us do tend to lean that way). Instead, we're saying we don't need them to exist to vindicate our reasoning. It works just as well if we have exchangeable beliefs about coin flips. The "50% chance" emerges from the symmetries in our beliefs, not from some metaphysical property of the coin.

Some might ask about quantum mechanics, which famously involves probabilities at a fundamental level. Even here, the debate about whether wave function collapse probabilities are "real" or just a device in our predictive models is ongoing. The pragmatic perspective can be extended into interpretations of quantum mechanics, but that's a bigger topic for another post.[8]

Quick Recap

Three key takeaways:

    We can talk about chance in purely pragmatic terms.Exchangeability and de Finetti's theorem show we lose nothing in predictive power.This viewpoint integrates well with Bayesian rationality and the "map vs. territory" framework.

 

  1. ^
  2. ^
  3. ^

    Also, the limiting relative frequency doesn’t change if we append any finite number of flips to the front of the sequence, which can mess up inference we try to make in the short to medium to even very long run. In general there are other issues like this, but we’ll keep it brief here.

  4. ^

    Of course, chances do play a role in inference, so they do constrain expectations. This makes them not the worst kind of mysterious answer. The upshot of the de Finetti theorem is the sifting the useful part of chance from the mysterious. This allows us to use chance talk, without reifying chance.

  5. ^

    There are generalizations of exchangeability, such as partial exchangeability and Markov exchangeability. For exposition, and since it is a core case, we focus here on the basic exchangeability property.

  6. ^

    Of course, there are sophisticated ways to try to bridge this gap, by showing that for a certain class of agents, certain dynamics will render an environment only predictable up to a certain degree.

  7. ^

    There is also a deep way in which the de Finetti perspective can help us make sense of randomized control trials.

  8. ^

    Although it is worth noting that many theories of quantum mechanics— in particular, Everettian and Bohmian quantum mechanics—are perfectly deterministic. Here is a summary of why Everett wanted a probability-free theory—the core idea is that most versions of QM that make reference to chances do so via measurement-induced collapses, which leads into the measurement problem. We think the genuinely chancey theory that is most likely to pan out is something like GRW, which doesn’t have measurement as a fundamental term in the theory. Jeff Barrett’s The Conceptual Foundations of Quantum Mechanics has greatly informed our views on QM, and is a great in-depth introduction.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

概率 交换性 德·菲内蒂 贝叶斯 信念
相关文章