MarkTechPost@AI 2024年07月02日
Researchers from UC Berkeley and Anyscale Introduce RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

RouteLLM 是一种开源的 LLM 路由框架,它有效地平衡了成本和性能,帮助用户选择最合适的 LLM 模型来处理不同的任务。该框架利用偏好数据来训练路由器,从而学习哪些查询可以由较弱的模型处理,哪些需要更强大的模型。通过这种方式,RouteLLM 能够在保证高质量响应的同时,显著降低成本。

😄 **RouteLLM 框架概述:** RouteLLM 是一种开源的 LLM 路由框架,旨在优化 LLM 模型的选择,平衡成本和性能。它利用偏好数据来训练路由器,学习不同模型的优劣势,从而将查询路由到最合适的模型。

🤩 **训练与评估:** RouteLLM 框架使用了四种不同的路由器:相似性加权 (SW) 排名路由器、矩阵分解模型、BERT 分类器和因果 LLM 分类器。通过使用偏好数据进行训练,这些路由器能够学习不同模型在处理不同查询时的优劣势。在 MT Bench、MMLU 和 GSM8K 等基准测试中,RouteLLM 框架表现出色,显著降低了成本,同时保持了高质量的响应。

😎 **与商用系统对比:** 与 Martian 和 Unify AI 等商用路由系统相比,RouteLLM 在性能方面表现出色,同时成本更低。这证明了 RouteLLM 框架的成本效益和竞争优势。

🥳 **泛化能力:** RouteLLM 框架在不同模型对 (例如 Claude 3 Opus 和 Llama 3 8B) 上也表现出色,无需重新训练即可保持良好的性能。这表明该框架能够学习区分强弱模型的通用特征,适用于新的模型对。

🤯 **结论:** RouteLLM 框架提供了一种可扩展且经济高效的解决方案,通过有效地平衡成本和性能来部署 LLM。该框架利用偏好数据和数据增强技术,确保高质量的响应,同时显著降低成本。RouteLLM 的开源发布,以及其数据集和代码,将促进 LLM 路由领域的进一步研究和应用。

Large Language Models (LLMs) have showcased impressive capabilities across various tasks but vary widely in costs and capabilities. Deploying these models in real-world applications presents a significant challenge: routing all queries to the most capable models ensures high-quality responses but is expensive while directing queries to smaller models saves costs at the expense of response quality. Researchers from UC Berkeley, Anyscale, and Canva propose RouteLLM, an open-source LLM routing framework that effectively balances price and performance to address this issue.

Challenges in LLM Routing

LLM routing aims to determine which model should handle each query to minimize costs while maintaining response quality. The routing system must infer the characteristics of incoming queries and the capabilities of different models, making the problem complex. RouteLLM addresses this by utilizing preference data to train its routers, allowing the system to learn which queries can be handled by weaker models and which require stronger models.

Framework and Methodology

RouteLLM formalizes the problem of LLM routing and explores augmentation techniques to improve router performance. The framework uses public data from Chatbot Arena and incorporates novel training methods. Four different routers were trained:

The training process leverages preference data, where each data point consists of a prompt and a comparison of response quality between two models. This method helps understand the strengths and weaknesses of different models relative to various queries.

Performance and Cost Efficiency

The performance of these routers was evaluated on benchmarks like MT Bench, MMLU, and GSM8K. The results demonstrated that the routers could significantly reduce costs without compromising quality. For instance, on MT Bench, the matrix factorization router achieved 95% of GPT-4’s performance while making only 26% of the calls to GPT-4, resulting in a 48% cost reduction compared to the random baseline. Augmenting the training data using an LLM judge further improved the routers’ performance, reducing the number of GPT-4 calls required to just 14% while maintaining the same performance level.

On MMLU, the routers initially performed poorly due to the out-of-distribution nature of most questions. However, augmenting the dataset with golden-label data from the MMLU validation split led to significant improvements. The best-performing causal LLM router required only 54% GPT-4 calls to achieve 95% GPT-4 performance, offering a 14% cost reduction compared to the random baseline.

Image Source

Comparison with Commercial Offerings

RouteLLM’s performance was compared against commercial routing systems like Martian and Unify AI. Using GPT-4 Turbo as the strong model and Llama 2 70B or Mixtral 8x7B as the weak model, RouteLLM achieved similar performance while being over 40% cheaper. This comparison underscores the cost-effectiveness and competitive edge of the RouteLLM framework.

Generalization to Other Models

To demonstrate its generalizability, RouteLLM was tested with different model pairs, such as Claude 3 Opus and Llama 3 8B. The routers maintained strong performance without retraining, indicating that they learned common characteristics that help distinguish between strong and weak models, applicable to new model pairs.

Conclusion

RouteLLM provides a scalable and cost-effective solution for deploying LLMs by effectively balancing cost and performance. The framework’s use of preference data and data augmentation techniques ensures high-quality responses while significantly reducing costs—the open-source release of RouteLLM, along with its datasets and code.


Check out the Paper, GitHub, and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

The post Researchers from UC Berkeley and Anyscale Introduce RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 路由 RouteLLM 开源 成本效益 性能优化
相关文章