MarkTechPost@AI 2024年07月08日
Researchers at the University College London Unravel the Universal Dynamics of Representation Learning in Deep Neural Networks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

伦敦大学学院的研究人员提出了一种新的理论,解释了深度神经网络在不同架构中如何学习通用表示。该理论关注于网络中间层的表示动态,并展示了这些动态如何随着网络深度的增加而变化。

🔍 研究人员关注于深度神经网络的中间层表示动态,提出了一种有效理论,解释了两个相似数据点在训练过程中的相互作用。这一理论在网络参数不受严格限制的情况下,能够解释多种不同激活函数和架构的深度网络的学习动态。

📈 该理论还探讨了有效学习率在不同隐藏层的变化。在标准梯度下降中,参数更新涉及参数的总和,因此变化与参数数量成正比。在更深的隐藏层中,编码器映射的参数数量增加,而解码器映射的参数数量减少,导致编码器的有效学习率随深度增加而增加,解码器的有效学习率随深度增加而减少。

🌐 研究强调,尽管该理论在解释深度学习中的通用行为方面取得了进展,但在应用于更大规模的数据集时仍面临挑战。进一步的研究是必要的,以更有效地处理这些复杂性,并应对更复杂的数据。

Deep neural networks (DNNs) come in various sizes and structures. The specific architecture selected along with the dataset and learning algorithm used, is known to influence the neural patterns learned. Currently, a major challenge faced in the theory of deep learning is the issue of scalability. Although exact solutions to learning dynamics exist for simpler networks, adjusting even a small part of the network architecture often requires significant changes to the analysis. Moreover, the state-of-the-art models are so complex that they outperform practical analytical solutions. These results consist of complex machine learning models and even the brain, posing challenges for theoretical study.

In this paper, the first related work is Exact solutions in simple architectures, where a lot of progress is made in the theoretical analysis of deep linear neural networks, e.g. the loss landscape is well understood, and exact solutions have been obtained for specific initial conditions. The next related approach is the neural tangent kernel, where a notable exception in terms of universal solutions is that it provides exact solutions applicable to a wide range of models. Next is the Implicit biases in the gradient descent technique, where the investigation of gradient descent is done as a source of generalization performance in DNNs. The final method is Local Elasticity, where a model shows this property if updating one feature vector minimally affects dissimilar feature vectors.

Researchers from the University College London have proposed a method for modeling universal representation learning, whose aim is to explain common phenomena observed in learning systems. An effective theory is developed for two similar data points to interact with each other during training when the neural network is large and complex, so, it’s not heavily limited by its parameters. Moreover, the existence of universal behavior is demonstrated in representational learning dynamics, by the fact that the derived theory explains the dynamics of various deep networks with different activation functions and architectures.

The proposed theory looks at the representation dynamics at “some intermediate layer H.” Since DNNs have many layers where representations can be observed, it poses a question of how these dynamics depend on the depth of the chosen intermediate layer. To answer this, it is necessary to determine on which layers the effective theory is still valid. For the linear approximation to be accurate, the representations must start close to each other. If the initial weights are small, each layer’s average activational gain factor is a constant G, which is less than 1. The initial representational distance is shown as a function of the depth n scales:

This function decreases, so the theory should be more accurate in the later layers of the network.

The effective learning rates are expected to vary at different hidden layers. In standard gradient descent, the update involves adding up the parameters, so changes are proportional to the number of parameters. In the deeper hidden layers, the number of parameters in the encoder map increases, while the number in the decoder map decreases. This causes the effective learning rate for the encoder to increase with depth and for the decoder to decrease with depth. This relationship holds for the deeper layers of the network where theory is accurate, however, in the earlier layers, the effective learning rate for the decoder appears to increase.

In summary, researchers from the University College London have introduced a new theory about how neural networks learn, focusing on their common learning patterns across different architectures. It shows that these networks naturally learn structured representations, especially when they start with small weights. Rather than presenting this theory as a definitive universal model, researchers highlighted that gradient descent, the fundamental method used in training neural networks, may support the aspects of representation learning. However, this approach faces challenges when applied to larger datasets, and further research is necessary to address these complexities effectively and deal with more complex data.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Researchers at the University College London Unravel the Universal Dynamics of Representation Learning in Deep Neural Networks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

深度神经网络 学习动力学 表示学习
相关文章