AdaDim: Dimensionality Adaptation for SSL Representational Dynamics

cs.AI updates on arXiv.org 05月20日 12:54

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文深入探讨了自监督学习（SSL）中防止维度坍塌的关键问题，维度坍塌指的是高维表征空间最终落入低维子空间的情况。现有的SSL优化策略旨在引导模型生成具有更高维度的表征（R）。研究表明，良好的SSL表征空间应具有较高的H(R)和较低的I(R;Z)。本文通过分析训练动态，揭示了特征去相关和样本均匀分布对H(R)和I(R;Z)的影响。研究发现，最佳SSL模型并非具有最高的H(R)或最低的I(R;Z)，而是在两者之间达到一个最佳的中间点。基于此，作者提出了一种名为AdaDim的方法，通过自适应地调整基于特征去相关和均匀样本分布的损失权重来利用这些观察到的训练动态。

🔑自监督学习（SSL）的一个核心挑战是防止维度坍塌，即高维表征空间降维至低维子空间，影响模型性能。

📈现有的SSL方法通过维度对比和样本对比两种策略来优化表征的维度。维度对比鼓励特征去相关，而样本对比则促进样本表征的均匀分布。

💡研究表明，训练初期特征去相关导致的H(R)增加会提高I(R;Z)，而训练后期样本均匀分布导致的H(R)增加则会使I(R;Z)趋于稳定或降低。

🎯最佳的SSL模型并非追求最高的H(R)或最低的I(R;Z)，而是在两者之间找到一个平衡点，达到最优的性能表现。

⚖️基于对训练动态的观察，作者提出AdaDim方法，通过自适应地调整特征去相关和均匀样本分布损失的权重，来优化SSL模型的训练过程。

arXiv:2505.12576v1 Announce Type: cross Abstract: A key factor in effective Self-Supervised learning (SSL) is preventing dimensional collapse, which is where higher-dimensional representation spaces span a lower-dimensional subspace. Therefore, SSL optimization strategies involve guiding a model to produce representations ($R$) with a higher dimensionality. Dimensionality is either optimized through a dimension-contrastive approach that encourages feature decorrelation or through a sample-contrastive method that promotes a uniform spread of sample representations. Both families of SSL algorithms also utilize a projection head that maps $R$ into a lower-dimensional embedding space $Z$. Recent work has characterized the projection head as a filter of irrelevant features from the SSL objective by reducing mutual information, $I(R;Z)$. Therefore, the current literature's view is that a good SSL representation space should have a high $H(R)$ and a low $I(R;Z)$. However, this view of the problem is lacking in terms of an understanding of the underlying training dynamics that influences both terms, as well as how the values of $H(R)$ and $I(R;Z)$ arrived at the end of training reflect the downstream performance of an SSL model. We address both gaps in the literature by demonstrating that increases in $H(R)$ due to feature decorrelation at the start of training lead to a higher $I(R;Z)$, while increases in $H(R)$ due to samples distributing uniformly in a high-dimensional space at the end of training cause $I(R;Z)$ to plateau or decrease. Furthermore, our analysis shows that the best performing SSL models do not have the highest $H(R)$ nor the lowest $I(R;Z)$, but arrive at an optimal intermediate point for both. We develop a method called AdaDim to exploit these observed training dynamics by adaptively weighting between losses based on feature decorrelation and uniform sample spread.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签