MarkTechPost@AI 02月17日
Enhancing Reasoning Capabilities in Low-Resource Language Models through Efficient Model Merging
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一项创新研究,旨在提升低资源语言(特别是泰语)的大型语言模型(LLM)的推理能力。该研究结合了数据选择和模型融合方法,在有限的计算资源下,成功地将DeepSeek R1的先进推理能力融入到Typhoon2 70B模型中,同时保持了目标语言的熟练度。通过监督微调(SFT)和模型合并,最终模型Typhoon2-R1-70B在推理任务和语言任务上均表现出色,实现了性能的显著提升。这项研究为低资源语言的LLM发展提供了一种高效且经济的解决方案。

🎯 研究人员提出了一种新颖的方法,通过结合数据选择和模型合并,增强特定语言LLM的推理能力,重点关注泰语模型。该方法旨在以有限的资源,将类似于DeepSeek R1的推理能力整合到模型中,同时保持目标语言的熟练程度。

🧮 该方法使用了Typhoon2 70B Instruct和DeepSeek R1 70B Distill作为基础模型,通过监督微调(SFT)Typhoon2 70B并将其与DeepSeek R1 70B合并来实现。训练配置采用了LoRA,并优化了计算效率,最终在4×H100 GPU上运行。

📊 实验结果表明,DeepSeek R1 70B Distill在AIME和MATH500等推理任务中表现出色,但在MTBench-TH等泰语特定任务中效果降低。Typhoon2 70B Instruct在语言特定任务中表现出色,但在推理方面存在不足。最终模型Typhoon2-R1-70B结合了两者的优势,在语言和推理任务上都取得了显著的性能提升。

🌐 研究揭示了在资源有限的情况下,SFT和模型合并可以有效地转移推理能力。然而,该研究也存在局限性,例如仅限于在单个模型系列中合并DARE,且未优化指令调整。多语言推理和模型合并仍然面临挑战,包括缺乏文化意识的推理轨迹。

Large Language Models (LLMs) have shown exceptional capabilities in complex reasoning tasks through recent advancements in scaling and specialized training approaches. While models like OpenAI o1 and DeepSeek R1 have set new benchmarks in addressing reasoning problems, a significant disparity exists in their performance across different languages. The dominance of English and Chinese in training data for foundation models like Llama and Qwen has created a substantial capability gap for low-resource languages. However, these models face challenges such as incorrect character usage and code-switching. These issues become pronounced during reasoning-focused fine-tuning and reinforcement learning processes.

Regional LLM initiatives have emerged to address low-resource language limitations through specialized pretraining and post-training approaches. Projects like Typhoon, Sailor, EuroLLM, Aya, Sea-lion, and SeaLLM have focused on adapting models for specific target languages. However, the data-centric approach to adapting reasoning capabilities lacks transparency in reasoning model data recipes. Moreover, scaling requires substantial computational resources, as evidenced by DeepSeek R1 70B’s requirement of 800K examples for distillation and general SFT, far exceeding academic efforts like Sky-T1 and Bespoke-Stratos. Model merging has emerged as an alternative approach, showing promise in combining multiple specialized models’ weights to improve performance across tasks without additional training.

Researchers from SCB 10X R&D and SCBX Group Bangkok, Thailand have proposed an innovative approach to enhance reasoning capabilities in language-specific LLMs, particularly focusing on Thai language models. The research combines data selection and model merging methods to incorporate advanced reasoning capabilities similar to DeepSeek R1 while maintaining target language proficiency. The study addresses the critical challenge of improving reasoning abilities in low-resource language models, using only publicly available datasets and a modest computational budget of $1,201, matching DeepSeek R1’s reasoning capabilities without compromising performance on target language tasks.

The implemented methodology utilizes Typhoon2 70B Instruct and DeepSeek R1 70B Distill as base models. The approach involves applying Supervised Fine-Tuning (SFT) to Typhoon2 70B and merging it with DeepSeek R1 70B. The training configuration employs LoRA with specific parameters: rank 32 and α of 16. The system uses sequence packing with 16,384 maximum lengths, alongside Liger kernels, FlashAttention-2, and DeepSpeed ZeRO-3 to optimize computational efficiency. Training runs on 4×H100 GPUs for up to 15 hours using axolotl4, with model merging performed via Mergekit. The evaluation focuses on two key aspects: reasoning capability and language task performance, utilizing benchmarks like AIME 2024, MATH-500, and LiveCodeBench, with Thai translations for assessment.

Experimental results reveal that DeepSeek R1 70B Distill excels in reasoning tasks like AIME and MATH500 but shows reduced effectiveness in Thai-specific tasks such as MTBench-TH and language accuracy evaluations. Typhoon2 70B Instruct shows strong performance in language-specific tasks but struggles with reasoning challenges, achieving only 10% accuracy in AIME and trailing DeepSeek R1 by over 20% in MATH500. The final model, Typhoon2-R1-70B combines DeepSeek R1’s reasoning capabilities with Typhoon2’s Thai language proficiency, achieving performance within 4% of Typhoon2 on language tasks while maintaining comparable reasoning abilities. This results in performance improvements of 41.6% over Typhoon2 and 12.8% over DeepSeek R1.

In conclusion, researchers present an approach to enhance reasoning capabilities in language-specific models, through the combination of specialized models. While the study proves that SFT and model merging can effectively transfer reasoning capabilities with limited resources, several limitations exist in the current methodology. The research scope was confined to merging DARE in a two-model setup within a single model family, without optimizing instruction tuning despite available high-quality datasets like Tulu3. Significant challenges persist in multilingual reasoning and model merging including the lack of culturally aware reasoning traces. Despite these challenges, the research marks a step toward advancing LLM capabilities in underrepresented languages.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

The post Enhancing Reasoning Capabilities in Low-Resource Language Models through Efficient Model Merging appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 模型融合 低资源语言 推理能力 泰语
相关文章