MarkTechPost@AI 2024年12月08日
Composition of Experts: A Modular and Scalable Framework for Efficient Large Language Model Utilization
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

专家组合(CoE)是一种模块化AI框架,通过两步过程动态路由输入到专业化的专家LLM:分类路由器将输入分类到预定义的类别,然后进行类别到专家的映射,分配最合适的专家。与单体LLM相比,这种方法增强了模块化、可扩展性和计算效率,允许轻松集成新功能。CoE在多个基准测试中展示了强大的性能,在Arena-Hard上得分为59.4,在MT-Bench上得分为9.06,同时显著减少了活动参数,展示了其在经济高效、高性能AI系统方面的潜力。

🧠CoE框架采用模块化设计,通过两步路由过程将输入提示动态分配给最合适的专家LLM,从而提高效率和专业化程度。第一步是分类路由器,将输入分类到预定义的类别;第二步是类别到专家的映射,将每个类别分配给最合适的专家。

📊CoE在多个基准测试中进行了评估,包括用于单轮交互的Arena-Hard、用于多轮对话的MT-Bench以及知识密集型任务(如GSM8k CoT和MMLU-Pro)。这些基准测试评估了CoE在平衡性能和计算效率方面的能力。

🚀在Arena-Hard上,CoE显示出改进的可扩展性和资源利用率,随着总参数预算(B)的增加,其性能优于单个专家模型。利用不确定性感知的路由的CoE的鲁棒版本进一步提高了稳定性和准确性。

🔄在MT-Bench上的多轮评估中,CoE通过在每一轮动态路由提示和对话历史到最合适的专家来展示效率,实现了与更大、资源密集型模型相当的结果。

📚由于训练数据分布差距,CoE在各个学科的知识特定任务上需要赶上各个专家,但可以使用鲁棒-CoE恢复性能。这是通过不确定性量化来实现的,这确保了准确路由到通才专家。

LLMs have revolutionized artificial intelligence with their remarkable scalability and adaptability. Models like GPT-4 and Claude, built with trillions of parameters, demonstrate exceptional performance across diverse tasks. However, their monolithic design comes with significant challenges, including high computational costs, limited flexibility, and difficulties in fine-tuning for domain-specific needs due to risks like catastrophic forgetting and alignment tax. Open-weight LLMs such as Llama3 and Mistral, supported by an active open-source community, have created smaller, task-specific expert models. These models address niche requirements effectively and often surpass monolithic models in specialized domains, though they remain resource-intensive for broader adoption.

Advances in LLM architecture and ensemble approaches have sought to optimize performance and efficiency. The Mixture of Experts (MoE) models uses gating mechanisms to route tasks to specialized experts, enhancing domain-specific accuracy. Similarly, ensemble methods, like LLMBlender, combine outputs from multiple models to improve overall performance. Other techniques, such as reward-guided routing and tag-based label enhancements, direct queries to the most relevant models, but their high inference costs pose practical challenges. These innovations highlight ongoing efforts to overcome the limitations of large-scale LLMs by balancing computational efficiency with specialization.

Researchers from SambaNova Systems have introduced the Composition of Experts (CoE). This modular AI framework dynamically routes inputs to specialized expert LLMs using a two-step process: a category router classifies inputs into predefined categories, followed by a category-to-expert mapping that assigns the most suitable expert. This approach enhances modularity, scalability, and computational efficiency compared to monolithic LLMs, allowing easy integration of new capabilities. Leveraging SambaNova’s SN40L hardware, CoE demonstrates strong performance, achieving scores of 59.4 on Arena-Hard and 9.06 on MT-Bench with significantly reduced active parameters, showcasing its potential for cost-effective, high-performance AI systems.

The CoE framework uses a subset of expert LLMs selected from a larger pool, routing input prompts via a function to the most suitable expert for output generation. The system minimizes loss while adhering to a parameter budget. A two-step routing process categorizes prompts and assigns them to the best expert within a category, enhancing modularity and interpretability. The framework uses labeled datasets for training and semi-supervised methods for prompt curation. Memory efficiency is managed by offloading models to CPUs or scaling across GPUs, ensuring flexibility and sustained performance despite the increasing number of experts.

The CoE framework is evaluated on several benchmarks, including Arena-Hard for single-turn interactions, MT-Bench for multi-turn conversations, and knowledge-intensive tasks like GSM8k CoT and MMLU-Pro. These benchmarks assess CoE’s ability to balance performance and computational efficiency. On Arena-Hard, CoE shows improved scalability and resource utilization, outperforming individual expert models as the total parameter budget (B) increases. The robust version of CoE, leveraging uncertainty-aware routing, further enhances stability and accuracy, achieving competitive scores with significantly fewer active parameters than closed-source models. Its modular design allows easy integration of new expert models for further performance improvements.

In multi-turn evaluation on MT-Bench, CoE demonstrates efficiency by dynamically routing prompts and conversation history to the most suitable expert at each turn, achieving results comparable to larger, resource-intensive models. Due to training data distribution gaps, CoE needs to catch up to individual experts for knowledge-specific tasks across various disciplines but recovers performance using Robust-CoE. This is achieved through uncertainty quantification, which ensures accurate routing to generalist experts. By leveraging open-weight LLMs like Qwen and Llama, CoE achieves competitive scores with reduced active parameters, showcasing its effectiveness as a cost-efficient, scalable, and modular AI system.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 60k+ ML SubReddit.

[Must Attend Webinar]: ‘Transform proofs-of-concept into production-ready AI applications and agents’ (Promoted)

The post Composition of Experts: A Modular and Scalable Framework for Efficient Large Language Model Utilization appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 大型语言模型 专家组合 模块化AI SambaNova
相关文章