MarkTechPost@AI 20小时前
Unbabel Introduces TOWER+: A Unified Framework for High-Fidelity Translation and Instruction-Following in Multilingual LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Unbabel团队推出了TOWER+模型,这是一种新型多语言大模型,旨在同时提升翻译准确性和通用语言能力。通过创新的训练方法,TOWER+模型在翻译质量、指令遵循和通用聊天能力之间取得了平衡,在多个基准测试中表现出色,并提供了可复现的框架,为未来特定领域的LLM开发提供了新的思路。该模型在多个参数规模上进行了探索,证明了在翻译质量和通用能力之间取得平衡的可行性,为企业和研究应用提供了有力的支持。

💡 TOWER+模型的核心在于其统一的训练流程。该流程包括持续预训练、监督微调、偏好优化和可验证的强化学习四个阶段,旨在平衡翻译准确性与通用语言能力。通过这种方式,模型既能保持翻译的专业性,又能处理代码生成、问题解决等多种任务。

🌍 TOWER+模型支持多种语言和方言。研究团队对包括27种语言和方言在内的320亿个token进行了持续预训练,涵盖了47种语言对。这种广泛的语言支持,使得模型能够处理更多样的翻译需求,并保持语言的流畅性。

🏆 TOWER+模型在多个基准测试中取得了优异成绩。例如,9B参数的TOWER+模型在多语言通用聊天提示上获得了33.47%的胜率,在XCOMET-XXL上获得了84.38分(24种语言对)。72B参数的模型在M-ArenaHard上获得54.52%的胜率,在IFEval指令遵循测试中获得89.02分,并在WMT24++基准测试中达到83.29分。这些结果表明,TOWER+在翻译准确性和通用能力之间取得了显著的平衡。

🛠️ TOWER+模型为未来LLM发展提供了新的框架。该研究提供了一个可复现的构建LLM的方法,该方法能够同时满足翻译和会话需求,减少了模型数量和运营开销。研究结果表明,通过统一的大规模预训练和专业的对齐阶段,可以在单个模型中实现翻译卓越性和会话通用性。

Large language models have driven progress in machine translation, leveraging massive training corpora to translate dozens of languages and dialects while capturing subtle linguistic nuances. Yet, fine-tuning these models for translation accuracy often impairs their instruction-following and conversational skills, and broad-purpose versions struggle to meet professional fidelity standards. Balancing precise, culturally aware translations with the ability to handle code generation, problem-solving, and user-specific formatting remains challenging. Models must also preserve terminological consistency and adhere to formatting guidelines across varied audiences. Stakeholders require systems that can dynamically adapt to domain requirements and user preferences without sacrificing fluency. Benchmark scores such as WMT24++, covering 55 language variants, and IFEval’s 541 instruction-focused prompts highlight the gap between specialized translation quality and general-purpose versatility, posing a critical bottleneck for enterprise deployment.

Current Approaches to Tailoring Language Models for Translation Accuracy

Multiple approaches have been explored to tailor language models for translation. Fine-tuning pre-trained large language models on parallel corpora has been used to improve the adequacy and fluency of translated text. Meanwhile, continued pretraining on a combination of monolingual and parallel data enhances multilingual fluency. Some research teams have supplemented training with reinforcement learning from human feedback to align outputs with quality preferences. Proprietary systems such as GPT-4o and Claude 3.7 have demonstrated leading translation quality, and open-weight adaptations including TOWER V2 and GEMMA 2 models have reached parity or surpassed closed-source models under certain language scenarios. These strategies reflect continuous efforts to address the dual demands of translation accuracy and broad language capabilities.

Introducing TOWER+: Unified Training for Translation and General Language Tasks

Researchers from Unbabel, Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa (Lisbon ELLIS Unit), and MICS, CentraleSupélec, Université Paris-Saclay, introduced TOWER+, a suite of models. The research team designed variants at multiple parameter scales, 2 billion, 9 billion, and 72 billion, to explore the trade-off between translation specialization and general-purpose utility. By implementing a unified training pipeline, the researchers aimed to position TOWER+ models on the Pareto frontier, achieving both high translation performance and robust general capabilities without sacrificing one for the other. The approach leverages architectures to balance the specific demands of machine translation with the flexibility required by conversational and instructional tasks, supporting a range of application scenarios.

TOWER+ Training Pipeline: Pretraining, Supervised Tuning, Preferences, and RL

The training pipeline begins with continued pretraining on carefully curated data that includes monolingual content, filtered parallel sentences formatted as translation instructions, and a small fraction of instruction-like examples. Next, supervised fine-tuning refines the model using a combination of translation tasks and diverse instruction-following scenarios, including code generation, mathematical problem-solving, and question-answering. A preference optimization stage follows, employing weighted preference optimization and group-relative policy updates trained on off-policy signals and human-edited translation variants. Finally, reinforcement learning with verifiable rewards reinforces precise compliance with transformation guidelines, using regex-based checks and preference annotations to refine the model’s ability to follow explicit instructions during translation. This combination of pretraining, supervised alignment, and reward-driven updates yields a robust balance between specialized translation accuracy and versatile language proficiency.

Benchmark Results: TOWER+ Achieves State-of-the-Art Translation and Instruction Following

The TOWER+ 9B model achieved a win rate of 33.47% on multilingual general chat prompts, while earning an XCOMET-XXL score of 84.38 across 24 language pairs, outperforming similarly sized open-weight counterparts. The flagship 72 billion-parameter variant secured a 54.52 percent win rate on M-ArenaHard, recorded an IFEval instruction-following score of 89.02, and reached an XCOMET-XXL level of 83.29 on the full WMT24++ benchmark. On the combined translation and instruction-following benchmark, IF-MT scored 5.55 for instruction adherence and 88.95 for translation fidelity, establishing state-of-the-art results among open-weight models. These outcomes confirm that the researchers’ integrative pipeline effectively bridges the gap between specialized translation performance and broad language capabilities, demonstrating its viability for both enterprise and research applications.

Key Technical Highlights of the TOWER+ Models

Conclusion: A Pareto-Optimal Framework for Future Translation-Focused LLMs

In conclusion, by unifying large-scale pretraining with specialized alignment stages, TOWER+ demonstrates that translation excellence and conversational versatility can coexist within a single open-weight suite. The models achieve a Pareto-optimal balance across translation fidelity, instruction-following, and general chat capabilities, offering a scalable blueprint for future domain-specific LLM development.


Check out the Paper and Models. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Unbabel Introduces TOWER+: A Unified Framework for High-Fidelity Translation and Instruction-Following in Multilingual LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

TOWER+ 大语言模型 机器翻译 多语言
相关文章