MarkTechPost@AI 05月21日 01:50
Enhancing Language Model Generalization: Bridging the Gap Between In-Context Learning and Fine-Tuning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了语言模型(LMs)在面对新信息结构时的泛化差异,重点比较了情境学习(in-context learning)和微调(fine-tuning)两种方法。研究发现,情境学习在某些推理类型上表现出更优越的泛化能力,这促使研究人员开发了通过将情境推理纳入训练数据来增强微调性能的方法。研究结果在多个数据集上进行了验证,并讨论了研究的局限性,如依赖于无意义的单词和特定的语言模型。文章最后,呼吁未来研究探索不同模型之间的学习和泛化差异,以扩展这些发现。

💡 语言模型在预训练后具备强大的情境学习能力,但微调面临挑战。微调需要大量数据,但泛化能力有限,例如,模型难以回答与训练数据相反的关联问题。

🔬 研究人员通过多种方法提升语言模型的适应性,包括情境学习研究、探索模型如何利用提示中未明确包含的信息,以及数据增强技术,利用语言模型增强有限数据集的性能。

📊 研究人员构建了多个数据集,用于隔离预训练数据中的知识,以创建清晰的泛化测试。研究发现,在数据匹配设置下,情境学习比微调表现出更灵活的泛化能力,但微调在处理较大知识结构中的逆向关系时也有优势。

⚙️ 研究者开发了一种方法,通过将情境推理纳入微调数据来增强微调的泛化能力。实验在多个数据集上进行,使用 Gemini 1.5 Flash 模型,并结合训练文档作为情境进行评估。关键创新在于使用情境泛化进行数据增强,包括局部和全局策略。

✅ 在逆向诅咒数据集上,情境学习在逆向关系上实现了接近上限的性能,而传统微调则表现不佳。通过情境推理增强的数据微调可以匹配纯情境学习的高性能。在简单三段论测试中,情境学习和增强微调表现更优。

Language models (LMs) have great capabilities as in-context learners when pretrained on vast internet text corpora, allowing them to generalize effectively from just a few task examples. However, fine-tuning these models for downstream tasks presents significant challenges. While fine-tuning requires hundreds to thousands of examples, the resulting generalization patterns show limitations. For example, models fine-tuned on statements like “B’s mother is A” struggle to answer related questions like “Who is A’s son?” However, the LMs can handle such reverse relations in context. This raises questions about the differences between in-context learning and fine-tuning generalization patterns, and how these differences should inform adaptation strategies for downstream tasks.

Research into improving LMs’ adaptability has followed several key approaches. In-context learning studies have examined learning and generalization patterns through empirical, mechanistic, and theoretical analyses. Out-of-context learning research explores how models utilize information not explicitly included in prompts. Data augmentation techniques use LLMs to enhance performance from limited datasets, with specific solutions targeting issues like the reversal curse through hardcoded augmentations, deductive closure training, and generating reasoning pathways. Moreover, synthetic data approaches have evolved from early hand-designed data to improve generalization in domains like linguistics or mathematics to more recent methods that generate data directly from language models.

Researchers from Google DeepMind and Stanford University have constructed several datasets that isolate knowledge from pretraining data to create clean generalization tests. Performance is evaluated across various generalization types by exposing pretrained models to controlled information subsets, both in-context and through fine-tuning. Their findings reveal that in-context learning shows more flexible generalization than fine-tuning in data-matched settings, though there are some exceptions where fine-tuning can generalize to reversals within larger knowledge structures. Building on these insights, researchers have developed a method that enhances fine-tuning generalization by including in-context inferences into the fine-tuning data.

Researchers employ multiple datasets carefully designed to isolate specific generalization challenges or insert them within broader learning contexts. Evaluation relies on multiple-choice likelihood scoring without providing answer choices in context. The experiments involve fine-tuning Gemini 1.5 Flash using batch sizes of 8 or 16. For in-context evaluation, the researchers combine training documents as context for the instruction-tuned model, randomly subsampling by 8x for larger datasets to minimize interference issues. The key innovation is a dataset augmentation approach using in-context generalization to enhance fine-tuning dataset coverage. This includes local and global strategies, each employing distinct contexts and prompts.

On the Reversal Curse dataset, in-context learning achieves near-ceiling performance on reversals, while conventional fine-tuning shows near-zero accuracy as models favor incorrect celebrity names seen during training. Fine-tuning with data augmented by in-context inferences matches the high performance of pure in-context learning. Testing on simple nonsense reversals reveals similar patterns, though with less pronounced benefits. For simple syllogisms, while the pretrained model performs at chance level (indicating no data contamination), fine-tuning does produce above-chance generalization for certain syllogism types where logical inferences align with simple linguistic patterns. However, in-context learning outperforms fine-tuning, with augmented fine-tuning showing the best overall results.

In conclusion, this paper explores generalization differences between in-context learning and fine-tuning when LMs face novel information structures. Results show in-context learning’s superior generalization for certain inference types, prompting the researchers to develop methods that enhance fine-tuning performance by incorporating in-context inferences into training data. Despite promising outcomes, several limitations affect the study. The first one is the dependency on nonsense words and implausible operations. Second, the research focuses on specific LMs, limiting the results’ generality. Future research should investigate learning and generalization differences across various models to expand upon these findings, especially newer reasoning models.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

The post Enhancing Language Model Generalization: Bridging the Gap Between In-Context Learning and Fine-Tuning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语言模型 情境学习 微调 泛化能力 数据增强
相关文章