cs.AI updates on arXiv.org 07月08日 14:58
Less is More: Empowering GUI Agent with Context-Aware Simplification
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了GUI代理从文本依赖向纯视觉方法转变的趋势,指出当前方法在数据收集和上下文建模方面的挑战。提出了一种名为SimpAgent的上下文感知简化框架,通过元素剪枝和历史压缩技术,提高了GUI代理的性能和效率。

arXiv:2507.03730v1 Announce Type: cross Abstract: The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GUI代理 上下文建模 SimpAgent
相关文章