少点错误 01月19日
What's the Right Way to think about Information Theoretic quantities in Neural Networks?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了在神经网络背景下应用信息论时遇到的挑战。由于神经网络的确定性和有时甚至是可逆性,导致香农信息度量退化,使得传统的互信息等概念在神经网络中难以直接应用。文章指出,现有的解决方案要么过于临时,要么依赖于不确定的假设,缺乏清晰的操作解释。文章还讨论了使用随机权重、V-信息和K-复杂性等替代方案,但这些方案也存在各自的问题。作者希望找到一种方法,能够继续使用简单的香农信息度量来分析神经网络的信息流动。

🤔 神经网络的确定性导致香农信息度量退化:神经网络的确定性映射使得互信息等概念变得无限大,这与直观理解的信息共享不符。

🧮 量化和分箱方法存在缺陷:虽然量化可以将确定性函数转化为随机函数,但这种方法依赖于量化方法,容易混淆信息论和几何概念,导致结果不稳定。

⚖️ 基于权重扰动的有效信息:一种尝试通过扰动权重来定义“鲁棒”的信息共享,但这种方法依赖于特定的权重扰动方式和优化目标,显得有些临时。

🧐 V-信息和K-复杂性:V-信息试图通过限制函数类别来扩展香农信息度量,但如何选择合适的函数类别仍然是一个问题;K-复杂性则面临不可计算性和与神经网络的统计特性不匹配的问题。

Published on January 19, 2025 8:04 AM GMT

Tl;dr, Neural networks are deterministic and sometimes even reversible, which causes Shannon information measures to degenerate. But information theory seems useful. How can we square this (if it's possible at all)? The attempts so far in the literature are unsatisfying.


Here is a conceptual question: what is the Right Way to think about information theoretic quantities in neural network contexts?

Example: I've been recently thinking about information bottleneck methods: given some data distribution , it tries to find features  specified by  that have nice properties like minimality (small ) and sufficiency (big ).

But as pointed out in the literature several times, the fact that neural networks implement a deterministic map makes these information theoretic quantities degenerate:

There are attempts at solving these problems in the literature, but the solutions so far are unsatisfying: they're either very adhoc, rely on questionable assumptions, lack clear operational interpretation, introduce new problems, or seem theoretically intractable.

Treat the weight as stochastic:

This paper (also relevant) defines several notions of information measure relative to an arbitrary choice of  and  (not a Bayesian posterior):

I like their idea of using shannon information measures to try to capture a notion of “robustly” shared information. but the attempts above so far seem pretty ad hoc and reliant on shaky assumptions. i suspect SLT would be helpful here (just read the paper and see things like casually inverting the fisher information matrix).

Use something other than shannon information measures:

There’s V-information which is a natural extension of shannon information measures when you restrict the function class to consider (due to e.g., computational constraints). But now the difficult question is the choice of natural function class. Maybe linear probes are a natural choice, but this still feels ad hoc.

There’s K-complexity, but there's the usual uncomputability and the vibes of intractability in mixing algorithmic information theory notions with neural networks when the latter has more of a statistical vibe than algorithmic. idk, this is just really vibes, but I am wary of jumping to the conclusion of thinking AIT is necessary in information theoretically analyzing neural networks based on the "there's determinism and AIT is the natural playing field for deterministic information processing systems"-type argument.


Ideally, I could keep using the vanilla shannon information measures somehow because they’re nice and simple and computable and seems potentially tractable both empirically and theoretically.

And so far, I haven't been able to find a satisfying answer to the problem. I am curious if anyone has takes on this issue.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

神经网络 信息论 互信息 香农信息 确定性
相关文章