少点错误 01月30日

Memorization-generalization in practice

本文深入探讨了退火（Tempering）技术与学习系数在实践中的联系。退火通过采样算法，如SGLD，在机器学习中实现，它通过牺牲性能换取熵，类似于向物理系统添加热量。退火过程逐步“噪声化”神经网络中效率较低的电路，并在损失增加显著高于温度参数时停止。文章还介绍了如何通过电路效率谱读取退火结果，以及如何从退火测量中提取电路效率谱。此外，文章还解释了学习系数如何从退火模型的损失采样中恢复，并讨论了退火过程中的随机性以及如何进行实验测量。

🔥退火技术通过采样算法（如SGLD）实现，在机器学习中通过牺牲性能来换取熵，这与向物理系统添加热量在数学上是相似的。

📉退火过程会逐步“噪声化”神经网络中效率较低的电路，当损失增加明显高于“温度”参数时停止噪声化，以此来优化模型。

📊可以通过“电路效率”谱来读取退火的结果，反之，也可以通过退火的测量结果来反推电路效率谱。这个过程包括根据特定温度计算损失的对数几率变化，并噪声化低效率电路，直到总变化匹配该值。

🧪学习系数可以通过对退火模型的损失进行采样来恢复，这可以帮助我们理解退火过程对模型性能的影响。

📈退火过程是一个随机过程，我们可以通过对多个采样轨迹中的程序进行平均来研究退火程序的期望值，从而理解退火的泛化行为。

Published on January 30, 2025 2:10 PM GMT

Short post today, which is part II.1 or my series on tempering and SLT (see part one here). In this post I’ll explain in a bit more detail the “in practice” connection that experiments should see between the learning coefficient spectrum, tempering, and empirical measurements of the learning coefficient. In future installments of this part I’ll explain a bit the theory behind this and how it relates to some notions inherent in the generalized “field theory” approach to modeling neural nets.

Practical measurements of the memorization-generalization spectrum

I’m trying to do less of the thing where I hide experimentally-relevant points behind a wall of theory, so let me try to explain the “upshots” of this part ahead of time, and talk about theory later (in future installments of part II of this series).

sampling algorithms

stochastic

expectation values

circuit efficiency

slopes

Δ_{precision} (t)

Δ_{precision} (t) \approx log (\frac{L_{0} + t}{L_{0}}),

L_{0}

Δ_{precision}

Δ_{precision} (t)

^[1]

This recipe can be reversed to extract the circuit efficiency spectrum from empirical measurements of the tempering process.

^{^}
Roughly: tempering means we ask the “loss precision isn’t much worse than t”, and the learning coefficient measures the variance. If improving the loss is very entropically expensive, then tempered NNs will be “very resistant” to increase the loss below their minimal allowed value, and this variance will be small. Note that for the conceptual cartoons I’m blurring out the difference between so-called “microcanonical” and “canonical” quantities, and real tempering has “soft” exponential cutoffs rather than exact “loss bounded by this value”-style effects.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

退火学习系数神经网络采样算法电路效率谱

相关文章

What is a long context window?

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Are Large Language Models a Path to AGI? with Ben Goertzel - #625

The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

Learning to Ponder: Memory in Deep Neural Networks with Andrea Banino - #528

Deep Learning is Eating 5G. Here’s How, w/ Joseph Soriaga - #525

Vector Quantization for NN Compression with Julieta Martinez - #498

Skip-Convolutions for Efficient Video Processing with Amir Habibian - #496

Natural Graph Networks with Taco Cohen - #440

Neural Ordinary Differential Equations with David Duvenaud - #364