MarkTechPost@AI 9小时前
Efficient AI Agents Don’t Have to Be Expensive: Here’s Proof
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了AI代理(AI Agents)在规模化应用中面临的成本挑战。研究指出,高性能模型如Claude 3.7 Sonnet虽然准确率高,但其单位任务成本是GPT-4.1的三到四倍。文章通过“cost-of-pass”这一关键指标,量化了AI代理的效率。研究发现,模型选择、规划步骤、工具使用方式以及记忆机制都会显著影响成本。OPPO AI Agent团队提出的“Efficient Agents”框架,通过选用适中模型、限制步骤、优化工具和记忆设计,能在保持近乎同等性能的同时,大幅降低AI代理的运行成本,为AI的广泛落地提供了实用路径。

💰 **AI代理成本高昂的根源**:当前先进的AI代理为完成多步骤任务,需要大量调用大型语言模型(LLMs)的API,例如GPT-4和Claude。研究发现,一些代理系统每次任务需要数百次API调用,这使得大规模部署成本极高,成为制约其广泛应用的主要瓶颈。OPPO团队的分析系统性地指出了成本产生的环节。

📈 **“Cost-of-Pass”衡量AI代理效率**:为了量化AI代理的效率,研究引入了“cost-of-pass”指标,即生成一个正确答案的总成本。该指标综合考虑了模型处理的token数量(输入输出的词语)以及模型一次性成功解决问题的准确率。通过该指标,可以清晰地比较不同模型和配置的成本效益。

🔬 **影响AI代理成本的关键因素**:研究实验表明,选择模型(如Claude 3.7 Sonnet比GPT-4.1成本高出3-4倍)、不必要的复杂规划步骤(增加成本但对成功率提升有限)、过度使用工具(如复杂的浏览器操作)以及冗余的记忆模块都会显著推高AI代理的成本。而简单的记忆机制反而能提供最佳的成本效益平衡。

🛠️ **“Efficient Agents”框架的构建**:OPPO团队提出的“Efficient Agents”框架,通过采用成本适中的模型(如GPT-4.1)、限制代理的思考步骤、广泛但不过度使用搜索工具,以及采用精简的记忆系统,实现了性能与成本的优化。该框架能够达到顶尖开源竞争对手(如OWL)96.7%的性能,但成本却低于其四分之三,有效降低了28.4%的支出。

💡 **AI代理的未来发展方向**:研究强调,AI的未来不仅在于其强大能力,更在于其实用性和经济性。对于开发者和企业而言,理解并优化“cost-of-pass”至关重要,应根据实际任务需求选择合适的模型和设计。精简高效的设计是AI代理大规模普及的关键,使AI技术能够更广泛地服务于各行各业。

Are AI agents getting too expensive to use at scale? It’s a hot topic in the world of artificial intelligence, and a fresh study from the OPPO AI Agent Team finally puts some real numbers—and solutions—on the table.

Today’s most impressive AI agents can tackle massive, multi-step tasks using the reasoning power of large language models (LLMs) like GPT-4 and Claude. But with every breakthrough, the price to run these systems has shot up, making it tough for businesses (and even researchers!) to deploy them broadly. Enter the “Efficient Agents” framework—a new recipe for agent systems that keeps nearly all the performance but dramatically cuts the cost.

The Real Problem: AI Agents Are Getting Pricey

Ever wondered why your favorite smart AI assistant hasn’t taken over every aspect of your workflow yet? It’s not just the tech—it’s the bill. Some cutting-edge agent systems need hundreds of API calls per task. Multiply that by thousands of users and, suddenly, “scalability” seems more like a pipe dream.

The OPPO team saw this coming. Their latest study systematically breaks down where agents rack up costs and, more importantly, how much complexity is really needed to solve everyday tasks.

The Game-Changer: Measuring AI Agent Efficiency

This research introduces a crystal-clear metric: cost-of-pass. Imagine it as “the total cost to generate a correct answer to a problem.” It factors in how much you pay for tokens (every word in and out of your model) and how good the model is at getting things right on the first try.

Here’s the punchline: High-performing models like Claude 3.7 Sonnet top the leaderboards on accuracy, but their cost-of-pass is three to four times higher than that of GPT-4.1. For simpler jobs, smaller models like Qwen3-30B-A3B do a little less but cost pennies in comparison.

The Big Experiments: What Makes Agents Expensive?

1. Backbone Model Choice

Claude 3.7 Sonnet nails 61.82% accuracy on a tough benchmark but costs $3.54 per successful task. GPT-4.1 drops a bit in accuracy (53.33%) but only costs $0.98. Want barebones, fast-and-cheap results? Qwen3 shrinks costs to $0.13 for basic tasks.

2. Planning and Scaling

You’d think “more planning” means “better results.” Not so fast. Too many steps equals higher cost, but not much boost in success rate. Scaling tricks that let the agent try more options (Best-of-N) burn lots of compute for tiny jumps in accuracy.

3. How Agents Use Tools

Agents can use browsers, search engines, and other tools to get fresh info. More search sources help up to a point, but fancy moves like page-up/page-down add cost without much payback. Keeping tool use simple and broad works best.

4. Agent Memory

Surprisingly, the simplest memory setup—just keeping track of actions and observations—gave the best balance of low cost and high effectiveness. Extra memory modules made agents slower and more expensive, for little gain.

Putting It All Together: The “Efficient Agents” Blueprint

Here’s how the Efficient Agents system cracks the code:

The result? Efficient Agents deliver 96.7% the performance of top open-source competitors (like OWL), but at less than three-quarters the cost! That’s a 28.4% drop in the bill, without sacrificing results.

Why This Matters

This research is a wake-up call: Smart AI isn’t just about being powerful—it’s about being practical. If you’re building or deploying agents, measure your cost-of-pass and pick your ingredients wisely. Don’t assume bigger is always better. Sometimes, simple wins.

The Efficient Agents framework is open-source, so you can start experimenting with these ideas right now. As AI becomes more pervasive, efficient design will be key—whether you’re rolling out agents at a startup or a Fortune 500 company.

Bottom line: Next-gen AI agents can be both smart and affordable if you’re willing to rethink how you build them. The Efficient Agents paper isn’t just another technical deep-dive—it’s a roadmap for making AI work everywhere. And who doesn’t want that?


Check out the Paper and GitHub Page. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Efficient AI Agents Don’t Have to Be Expensive: Here’s Proof appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI代理 成本优化 OPPO 大型语言模型 效率提升
相关文章