Published on August 2, 2025 1:57 PM GMT
The lottery question
Alice comes to Bob and asks: "What is the probability that I've won the lottery?" Bob's first intuition (his actual prior probability) would be to answer "1/#lottery_tickets." But then Bob thinks "Wait, why would she even ask me that? Did she actually win the lottery?" This would change his answer to this question, moving the probability higher than his prior.
General problem
In general, if we have a query-answering oracle, which gives the probability of the event in the query, it would not give its "actual" probability of the event P(E), but rather the probability of that event happening, conditioned on the fact that this query is asked, or P(E|E is queried).
This does introduce the problem of obtaining the probability distribution of such a machine, in the sense that it isn't possible by simply querying about the probability of all mutually exclusive collectively exhaustive hypotheses.
I assume those probabilities wouldn't even add up to 1. For example, in the case of Bob, if all participants in the lottery asked him about their chance of winning the lottery, and if Bob had given all of them probability higher than the prior, then the sum of those probabilities would be higher than 1. And that would happen even if Bob is updating on all previous queries, and the probability of the "I'm actually asked by each participant" hypothesis is rising, with the latest answers being much closer to the prior than the earlier ones. The main reason for this is that the probability theory doesn't guarantee that the probabilities of mutually exclusive, collectively exhaustive hypotheses add up to 1 when they are conditioned on different evidence. They add up to 1 only when the evidence is the same.
Why not just...?
Renormalize
Renormalization wouldn't help, because the probabilities are higher for the first queries only because they are the first ones. Maybe we could average out probabilities for all possible permutations of the query-sequence, but that sounds too computationally intensive.
Erase the memory before each query
That would help for symmetrical cases like the lottery example, but wouldn't help much when the update on the query is not the same for all hypothesis.
Add "hey I'm just probing you, please don't update on that query" in the query
That might decrease the update a bit, but insofar if inquirer counterfactually adds that in cases they need the answer in some hypothesis-specific case the oracle would still update somewhat.
Conclusion
This problem somehow relates to "Bayesians should update on the fact that they observed the evidence, not only the evidence itself". But for me it seems like the opposite problem, because in that case we want the oracle to not update on the fact that we asked it something.
There is also the question of why we don't have access to the probability distribution of the oracle directly. That might be the case if the oracle doesn't have it explicitly, and calculates the probability only when queried to do so.
I don't know the solution to this problem, so please suggest your answers in the comments (or maybe somebody already talked about it, then please share the source). I also don't know if the problem of updating on the query is only a problem for obtaining the probability distribution; it might cause other problems I haven't thought about.
Discuss