MarkTechPost@AI 03月11日
Understanding Generalization in Deep Learning: Beyond the Mysteries
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

深度学习中的泛化现象,如良性过拟合和双下降,并非神经网络独有,也非神秘莫测。通过PAC-Bayes等框架,可以理解这些现象。文章提出“软归纳偏置”作为统一原则,强调在保持灵活性的同时,偏好与数据一致的简单解。这适用于各种模型,表明深度学习与其他方法并无根本区别,只是在特定方面有所不同。拥抱灵活的假设空间需要先验偏置以确保良好的泛化能力,并强调传统框架之外的替代方法对于理解现代机器学习泛化特性的价值。

💡深度学习的泛化行为,如良性过拟合和双下降,并非神经网络独有,而是可以通过PAC-Bayes和可数假设界等现有框架来理解。

🧠“软归纳偏置”是解释这些现象的关键统一原则。它不是限制假设空间,而是在保持灵活性的同时,偏好与数据一致的更简单解决方案。这种原则适用于各种模型类别,表明深度学习与其他方法没有根本区别。

📉双下降指的是随着模型参数的增加,泛化误差先减小、增大,然后再次减小的现象。这种现象可以在ResNet-18和线性模型中观察到,并可以通过PAC-Bayes界进行形式化追踪。

🌐良性过拟合描述了模型完美拟合噪声,同时在结构化数据上仍然表现良好的能力。这种行为与VC维度和Rademacher复杂度等已建立的泛化框架相矛盾,但并非神经网络所独有。

Deep neural networks’ seemingly anomalous generalization behaviors, benign overfitting, double descent, and successful overparametrization are neither unique to neural networks nor inherently mysterious. These phenomena can be understood through established frameworks like PAC-Bayes and countable hypothesis bounds. A researcher from New York University presents “soft inductive biases” as a key unifying principle in explaining these phenomena: rather than restricting hypothesis space, this approach embraces flexibility while maintaining a preference for simpler solutions consistent with data. This principle applies across various model classes, showing that deep learning isn’t fundamentally different from other approaches. However, deep learning remains distinctive in specific aspects.

Inductive biases traditionally function as restriction biases that constrain hypothesis space to improve generalization, allowing data to eliminate inappropriate solutions. Convolutional neural networks exemplify this approach by imposing hard constraints like locality and translation equivariance on MLPs through parameter removal and sharing. Soft inductive biases represent a broader principle where certain solutions are preferred without eliminating alternatives that fit the data equally well. Unlike restriction biases with their hard constraints, soft biases guide rather than limit the hypothesis space. These biases influence the training process through mechanisms like regularization and Bayesian priors over parameters.

Embracing flexible hypothesis spaces has complex real-world data structures but requires prior bias toward certain solutions to ensure good generalization. Despite challenging conventional wisdom around overfitting and metrics like Rademacher complexity, phenomena like overparametrization align with the intuitive understanding of generalization. These phenomena can be characterized through long-established frameworks, including PAC-Bayes and countable hypothesis bounds. The concept of effective dimensionality provides additional intuition for understanding behaviors. Frameworks that have shaped conventional generalization wisdom often fail to explain these phenomena, highlighting the value of established alternative methods for understanding modern machine learning‘s generalization properties.

Benign overfitting describes models’ ability to perfectly fit noise while still generalizing well on structured data, showing that capacity for overfitting doesn’t necessarily lead to poor generalization on meaningful problems. Convolutional neural networks could fit random image labels while maintaining strong performance on structured image recognition tasks. This behavior contradicts established generalization frameworks like VC dimension and Rademacher complexity, with the authors claiming no existing formal measure could explain these models’ simplicity despite their enormous size. Another definition for benign overfitting is  described as “one of the key mysteries uncovered by deep learning.” However, this isn’t unique to neural networks, as it can be reproduced across various model classes.

Double descent refers to a generalization error that decreases, increases, and then decreases again as model parameters increase. The initial pattern follows the “classical regime” where models capture useful structure but eventually overfit. The second descent occurs in the “modern interpolating regime” after training loss approaches zero. Double descent is shown for a ResNet-18 and a linear model. For the ResNet, cross-entropy loss is seen on CIFAR-100 as the width of each layer increases. As layer width increases in the ResNet or parameters increase in the linear model, both follow similar patterns: Effective dimensionality rises until it reaches the interpolation threshold, then decreases as generalization improves. This phenomenon can be formally tracked using PAC-Bayes bounds.

In conclusion, Overparametrization, benign overfitting, and double descent represent intriguing phenomena deserving continued study. However, contrary to widespread beliefs, these behaviors align with established generalization frameworks, can be reproduced in non-neural models, and can be intuitively understood. This understanding should bridge diverse research communities, preventing valuable perspectives and frameworks from being overlooked. Other phenomena like grokking and scaling laws aren’t presented as evidence for rethinking generalization frameworks or as neural network-specific. Recent research confirms that these phenomena apply to linear models. Moreover, PAC-Bayes and countable hypothesis bounds remain consistent with large language models.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Meet Parlant: An LLM-first conversational AI framework designed to provide developers with the control and precision they need over their AI customer service agents, utilizing behavioral guidelines and runtime supervision. It’s operated using an easy-to-use CLI and native client SDKs in Python and TypeScript .

The post Understanding Generalization in Deep Learning: Beyond the Mysteries appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

深度学习 泛化 软归纳偏置
相关文章