Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

cs.AI updates on arXiv.org 07月22日 12:34

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文提出了一种名为Symbolic-MoE的符号化、文本驱动、无梯度的专家混合（Mixture-of-Experts）框架，旨在实现大规模和多样化任务的专家模型自适应选择。与传统的粗粒度任务级选择不同，Symbolic-MoE能够根据实例的具体需求，进行细粒度的技能级专家选择，例如在数学任务中选择代数专家，或在生物医学推理中选择分子生物学专家。该框架通过一种技能驱动的招募策略，动态地选择最相关的专家LLM集合来处理不同的推理任务。每个选定的专家生成独立的推理结果，然后由一个能够整合多样化推理输出的聚合器生成最终高质量的响应。为解决模型频繁加载和卸载带来的计算开销，Symbolic-MoE采用了批处理策略，将具有相同专家需求的实例进行分组，从而实现单GPU集成16个专家模型，且计算成本可与使用4个GPU的多智能体基线相媲美。在MMLU-Pro、GPQA、AIME和MedMCQA等基准测试中，Symbolic-MoE在平均性能上显著优于GPT4o-mini等强LLM以及多智能体方法，平均提升8.15%。此外，它还能很好地泛化到未见过的任务，并减少了多轮讨论的需求，以更低的计算成本超越了讨论基线。

💡 Symbolic-MoE框架的核心在于其细粒度的专家选择机制，它能识别并利用特定任务所需的具体技能，如数学中的代数或生物医学中的分子生物学，从而实现比传统任务级专家选择更精准的模型匹配，提升处理多样化任务的效率和效果。

🚀 该框架采用了一种创新的技能驱动招募策略，能够根据不同推理任务的特点，动态地选择最擅长相关技能的专家LLM集合。每个选定的专家独立生成推理过程，最终由一个专门的聚合器整合这些多样化的输出，形成一个高质量的综合性回复。

⚡ 为了克服因频繁加载和卸载模型而产生的计算开销，Symbolic-MoE引入了批处理策略。通过将需求相似的实例分组，可以实现模型的一次性加载，使得在单个GPU上集成16个专家模型成为可能，并且其计算效率可以与使用更多GPU的多智能体基线方法相媲美。

📈 实验结果表明，Symbolic-MoE在MMLU-Pro、GPQA、AIME和MedMCQA等多个基准测试中表现出色，平均性能超越了GPT4o-mini等强大LLM以及其他多智能体方法，平均提升了8.15%。该模型还展现出良好的泛化能力，能够处理未见过的任务，并且通过减少不必要的讨论环节，以更低的计算成本实现了优于讨论基线的效果。

arXiv:2503.05641v3 Announce Type: replace-cross Abstract: Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting task-level experts is often too coarse-grained, as heterogeneous tasks may require different expertise per instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, e.g., algebra in math or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator chosen based on its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch strategy that groups instances based on their assigned experts, loading each model only once. This allows us to integrate 16 expert models on 1 GPU with a time cost comparable to or better than prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we show that Symbolic-MoE beats strong LLMs like GPT4o-mini, as well as multi-agent approaches, with an absolute avg. gain of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE generalizes well to unseen tasks and removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签