MarkTechPost@AI 2024年11月09日
Researchers at Cambridge Provide Empirical Insights into Deep Learning through the Pedagogical Lens of Telescopic Model that Uses First-Order Approximations
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

剑桥大学的研究人员提出了一种简单的望远镜模型,通过对神经网络训练过程进行一阶近似,提供了对神经网络行为的经验性见解。该模型揭示了神经网络的一些反直觉现象,如双下降和顿悟学习,并解释了神经网络在表格数据上表现不如XGBoost的原因。此外,研究还探讨了梯度稳定和权重平均对神经网络的影响。这项研究为理解神经网络的复杂行为提供了新的视角,也为未来的研究提供了新的方向。

🤔**望远镜模型:**该模型通过一阶近似模拟神经网络训练过程,能够捕捉到神经网络在训练和测试过程中的复杂行为,例如双下降和顿悟学习,并量化学习复杂度,解释了这些现象与训练和测试复杂度之间存在差异。

📊**表格数据上的表现:**研究发现,神经网络在处理表格数据时,特别是存在不规则性和稀疏性的数据时,表现不如XGBoost。这是因为神经网络的切线核是无界的,而XGBoost的核在面对测试数据时表现更稳定。

📈**梯度稳定和权重平均:**研究表明,随着训练的进行,梯度更新变得更加一致,导致损失函数表面更平滑,并有助于线性模式连接和权重平均,这两种方法都取得了很大的成功。

💡**揭示神经网络反直觉行为:**该研究通过经验性调查,利用望远镜模型揭示了深度学习中一些令人费解的现象,例如双下降、顿悟学习等,为理解神经网络提供了一种新的视角。

🔬**推动神经网络研究:**这项研究为理解神经网络的奥秘提供了新的思路,将激发更多关于神经网络的经验和理论研究。

Neural networks remain a beguiling enigma to this day. On the one hand, they are responsible for automating daunting tasks across fields such as image vision, natural language comprehension, and text generation; yet, on the other hand, their underlying behaviors and decision-making processes remain elusive. Neural networks many times exhibit counterintuitive and abnormal behavior, like non-monotonic generalization performance, which reinstates doubts about their caliber. Even XGBoost and Random Forests outperform neural networks in structured data. Furthermore, neural nets often behave like linear models—this invokes great confusion, given that they are renowned for their ability to model complex nonlinearities. These issues have motivated researchers to decode neural networks.

Researchers at the University of Cambridge presented a simple model to provide empirical insights into neural networks. This work follows a hybrid approach to applying theoretical research principles to simple yet accurate models of neural networks for empirical investigation. Inspired by the work of Neural Tangent Kernels, the authors consider a model that uses first-order approximations for functional updates made during training. Moreover, in this definition, the model increments by telescoping out approximations to individual updates made during training to replicate the behavior of fully trained practical networks. The whole setup to conduct empirical investigations could be articulated as a pedagogical lens to showcase how neural networks sometimes generalize seemingly unpredictably. The research also proposes methods to construct and extract metrics for predicting and understanding this abnormal behavior.

The authors present three case studies in this paper for empirical investigation. Firstly, the proposed telescopic model extends an existing metric for measuring model complexity to neural networks. The purpose of this incorporation was to understand the overfitting curves and generalizing behavior of networks, especially on new data when the model underperformed. Their findings included the phenomenon of double descent and grokking linked to changes in the complexity of the model during training and testing. Double descent basically explains the non-monotonic performance of the telescopic model when its test performance first worsened (normal overfitting) but then improved upon increasing model complexity. In grokking, even after achieving perfect performance on the training data, a model may continue to improve its performance on the test data significantly after a long period. The telescopic model quantifies learning complexity, double descent, and grokking during training and establishes the causation of these effects to be the divergence between training and test complexity.

The second case study explains the underperformance of neural networks relative to XGBoost on tabular data. Neural Networks struggle with tabular data, particularly those with irregularities, despite their remarkable versatility. Although both models exhibit similar optimization behaviors, XGBoost wins the race by better handling feature irregularities and sparsity. In the study, the telescopic model and XGBoost used kernels, but it was established that the tangent kernel of neural nets was unbounded, which meant every point could be used differently, while XGBoost kernels behaved more predictably when exposed to test data.

The last case discussed gradient stabilization and weight averaging. The model revealed that as training progresses, gradient updates become more aligned, leading to smoother loss surfaces. They showed how gradient stabilization during training contributes to linear mode connectivity and weight averaging, which has become very successful.

The proposed telescopic model for neural network learning helped in understanding several perplexing phenomena in deep learning through empirical investigations. This work would instigate more efforts to understand the mystery of neural nets both empirically and theoretically.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS

The post Researchers at Cambridge Provide Empirical Insights into Deep Learning through the Pedagogical Lens of Telescopic Model that Uses First-Order Approximations appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

神经网络 深度学习 望远镜模型 双下降 XGBoost
相关文章