DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

cs.AI updates on arXiv.org 15小时前

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

本文提出DynamixSFT，一种动态自动化指令微调数据集混合优化方法，通过多臂老虎机设置和Prior-scaled Boltzmann探索，在保持数据集多样性和覆盖度的同时，提升模型性能。

arXiv:2508.12116v1 Announce Type: cross Abstract: As numerous instruction-tuning datasets continue to emerge during the post-training stage, dynamically balancing and optimizing their mixtures has become a critical challenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model's performance at its current state. When applied to the Tulu-v2-mixture collection comprising 16 instruction-tuning datasets, DynamixSFT achieves up to a 2.2% performance improvement across 10 benchmarks. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

指令微调数据集动态优化多臂老虎机模型性能提升

相关文章

Holistic Optimization of the LinkedIn News Feed - TWiML Talk #224

Beyond A/B Testing: How Multi-Armed Bandits Can Scale Complex Experimentation in Enterprise

Salesforce AI Research Unveils APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

This AI Paper from Cornell Introduces UCB-E and UCB-E-LRF: Multi-Armed Bandit Algorithms for Efficient and Cost-Effective LLM Evaluation

Cracking the Code of AI Alignment: This AI Paper from the University of Washington and Meta FAIR Unveils Better Alignment with Instruction Back-and-Forth Translation

腾讯推出新一代大模型“混元Turbo”，性能大幅提升，定价低 50%

132年未解开的李雅普诺夫函数谜题，被Symbolic Transformer攻克了

The Multi-Armed Bandit Problem and Its Solutions

华为公布人工智能模型处理专利可用于提升AI模型性能

Dynamic Adaptive Optimization for Effective Sentiment Analysis Fine-Tuning on Large Language Models