少点错误 05月07日 07:07
$500 + $500 Bounty Problem: An (Approximately) Deterministic Maximal Redund Always Exists
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了随机变量冗余性的概念,特别是近似冗余。文章定义了变量Γ在X1和X2上冗余的条件,并引入了近似冗余的概念,通过图表和公式描述了近似冗余的数学表达。核心问题是寻找一个最大冗余变量Ω,它包含任何其他近似冗余变量中关于X的所有信息,并且是X的一个近似确定性函数。文章提出了一项悬赏,寻求证明这种最大冗余变量Ω的存在性,并给出了精确情况下的直观解释。解决这个问题将对电话定理的新版本以及自然潜在变量的计算产生重要影响。

💡**冗余变量定义**: 变量Γ在两个随机变量X1和X2上冗余,当X1和X2各自包含关于Γ的全部信息,即P[Γ|X1]=P[Γ|X2]=P[Γ|X]。

🧮**近似冗余的数学表达**: 通过KL散度定义近似冗余,公式为ϵred≥DKL(P[X,Γ]||P[X1]P[X2|X1]P[Γ|X2])和ϵred≥DKL(P[X,Γ]||P[X2]P[X1|X2]P[Γ|X1])。

🎯**最大冗余变量Ω**: 目标是找到一个最大冗余变量Ω,它包含任何其他近似冗余变量中关于X的所有信息,且Ω是X的一个近似确定性函数,满足ϵdet≥H(Ω|X)。

💰**悬赏内容**: 悬赏$500寻求证明对于任意两个随机变量X1和X2,都存在满足上述要求的Ω,且误差界限合理(ϵ与其他ϵ按ϵ²比例缩放)。

⚙️**精确情况下的直观解释**: 在精确情况下,可以通过构建二分图,将X1和X2的值作为顶点,如果P[X1=x1,X2=x2]>0则连接顶点。最大冗余变量f(X)可以定义为X所在的连通分量。

Published on May 6, 2025 11:05 PM GMT

A lot of our work involves "redunds". A random variable  is a(n exact) redund over two random variables  exactly when both

Conceptually, these two diagrams say that  gives exactly the same information about  as all of , and  gives exactly the same information about  as all of ; whatever information  contains about  is redundantly represented in  and . Unpacking the diagrammatic notation and simplifying a little, the diagrams say  for all  such that .

The exact redundancy conditions are too restrictive to be of much practical relevance, but we are more interested in approximate redunds. Approximate redunds are defined by approximate versions of the same two diagrams:

Unpacking the diagrammatic notation, these two diagrams say

This bounty problem is about the existence of a(n approximate) maximal redund : a redund which contains (approximately) all the information about  contained in any other (approximate) redund. Diagrammatically, a maximal redund  satisfies:

Finally, we'd like our maximal redund  to be an approximately deterministic function of , i.e. , which is equivalent to the diagram

What We Want For The Bounty

This bounty pays out $500 for establishing that there always, over any two random variables , exists an  satisfying the requirements above, i.e.

... with reasonable error bounds.

What counts as reasonable error bounds? We're expecting all the other 's to scale with  in the diagrams above, and ideally be small constant multiples of . In other words: the higher the approximation error on allowable redunds , the more approximation error we allow for all the conditions on . Reasonableness of error bound is partially a judgement call on our part, but small constant (like e.g. 2 or 3) multiples of  would definitely suffice.

Bounds should be global, i.e. apply even when  is large. We're not interested in e.g. first or second order approximations for small  unless they provably apply globally.

If the error bounds are sufficiently reasonable, a solution to this problem will also solve our other open bounty problem, paying out another $500, for a total of $1000.

If people post major intermediate steps which are load-bearing for an eventual solution, we might allocate the bounty based on our judgement of different peoples' relative contributions; the hope is to incentivize sharing intermediate results.

We might also partially pay out for counterexamples. That's a judgement call depending on how thoroughly the counterexample kills hope of any theorem vaguely along the lines intended.

In terms of rigor and allowable assumptions, roughly the level of rigor and assumptions in the posts linked above is fine.

Some Intuition From The Exact Case

A proof in the exact case is straightforward; we'll walk through it here to build some intuition.

The exact redundancy conditions say

wherever . So, we can form a bipartite graph, where each vertex represents a value  of  or a value  of , and two vertices are connected iff . Then the definition of redundancy  says that, for ANY redund  and  are the same for ANY values of  and  (respectively) in the same connected component.

So, let  be defined as the connected component in which  falls. For any redund  must be a function of  (since two -values in the same connected component have the same ). Thus

 is therefore a(n exactly) deterministic function of , and (exactly) mediates between X and any redund . Thus,  satisfies our requirements, in the exact case (i.e. when ).

Alas, this proof does not easily generalize to the approximate case.

Why We Want This

This is a conjecture which we've had on the back burner for a while, and it would be relevant in many places. It would solve our other open bounty problem, it would likely be our starting point for a new version of the Telephone Theorem which more explicitly connects to the natural latents framework, and it would directly simplify the problem of computing natural latents numerically. Probably there are other use-cases which we're not thinking about right at the moment.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

冗余变量 信息论 KL散度 最大冗余 近似确定性
相关文章