VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

cs.AI updates on arXiv.org 08月05日 19:10

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

本文介绍了一种名为veomni的模块化、高效的全模态LLM训练框架，通过解耦通信与计算，实现高效的三维并行，并支持新模态的灵活集成，显著提升了全模态LLM的训练效率和可扩展性。

arXiv:2508.02317v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. %We present \veomni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. \veomni introduces model-centric distributed recipes that decouples communication from computation, enabling efficient 3D parallelism on omni-modal LLMs. \veomni also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. %Using \veomni, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

全模态LLM 训练框架 veomni 三维并行模态集成

相关文章

SaySelf: A Machine Learning Training Framework That Teaches LLMs To Express More Accurate Fine-Grained Confidence Estimates

补齐Transformer规划短板又不放弃快速思考，田渊栋团队的Dualformer融合System 1和2双重优势

全模态对齐框架align-anything来了：实现跨模态指令跟随

A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Discussion and Author Responses

DeepSeek-V3带火大模型infra，入门看这篇就够了！

Meet Satori: A New AI Framework for Advancing LLM Reasoning through Deep Thinking without a Strong Teacher Model

阿里开源R1-Omni，DeepSeek同款RLVR首度结合全模态情感识别，网友：可解释性+多模态学习=下一代AI

阿里开源R1-Omni，DeepSeek同款RLVR首度结合全模态情感识别，网友：可解释性+多模态学习=下一代AI

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Scaling RL to Long Videos