MarkTechPost@AI 01月24日
Mobile-Agent-E: A Hierarchical Multi-Agent Framework Combining Cognitive Science and AI to Redefine Complex Task Handling on Smartphones
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Mobile-Agent-E是一款创新的移动助手,通过分层多代理框架解决了智能手机上复杂任务处理的难题。它由一个管理代理和四个专门的下属代理组成,分别负责任务规划、视觉感知、操作执行、错误验证和信息汇总。该系统具有自我进化能力,通过长期记忆系统,不断更新提示和快捷方式,从而提高性能和减少错误。Mobile-Agent-E在处理复杂任务时,能有效平衡高层规划和低层动作精度,并通过反馈循环不断优化。实验表明,该系统在任务完成率和用户满意度方面均优于现有模型。

🤖Mobile-Agent-E采用分层多代理框架,包含一个管理代理和四个下属代理,实现高效的任务委派和执行。

🧠系统通过长期记忆系统,不断更新提示(Tips)和快捷方式(Shortcuts),模仿人类认知过程,提升性能并减少冗余错误。

⚡️快捷方式(Shortcuts)减少了计算开销,从而在更少资源下实现更快的任务执行,例如,任务完成时间缩短了20%。

🏆Mobile-Agent-E在真实应用中表现出色,与现有模型相比,满意度评分提高了15%,任务完成率也显著提升。

Smartphones are essential tools in dAIly life. However, the complexity of tasks on mobile devices often leads to frustration and inefficiency. Navigating applications and managing multi-step processes consumes time and effort. Advancements in AI have introduced large multimodal models (LMMs) that enable mobile assistants to perform intricate operations autonomously. While these innovations aim to simplify technology, they often fail to meet practical demands. Addressing these gaps requires advanced AI capabilities and adaptable systems.

Current mobile assistants struggle to handle complex tasks requiring long-term planning, reasoning, and adaptability. Tasks like creating itineraries or comparing prices involve multiple steps across platforms. These systems treat each task as isolated, lacking the ability to learn from experience or optimize performance for repeated tasks, leading to inefficiency. Also, allocating identical resources to all tasks, regardless of complexity, reduces effectiveness in demanding scenarios. 

Some frameworks address these challenges but remain limited in planning and decision-making. Current mobile agents like AppAgent and Mobile-Agent-v1 focus on short, predefined tasks. Systems like Mobile-Agent-v2, despite improved planning, fail to incorporate a hierarchical structure for effective task delegation and refinement. These limitations highlight the need for more advanced mobile assistant designs.

Researchers from the University of Illinois Urbana-Champaign and Alibaba Group have developed Mobile-Agent-E, a novel mobile assistant that addresses these challenges through a hierarchical multi-agent framework. The system features a Manager agent responsible for planning and breaking down tasks into sub-goals, supported by four subordinate agents: Perceptor, Operator, Action Reflector, and Notetaker. These agents specialize in visual perception, immediate action execution, error verification, and information aggregation. A standout feature of Mobile-Agent-E is its self-evolution module, which includes a long-term memory system. This memory is divided into two components: 

    Tips, which provide generalized guidance based on previous tasksShortcuts, which are reusable sequences of operations tailored to specific recurring subroutines

Mobile-Agent-E operates by continuously refining its performance through feedback loops. After completing each task, the system’s Experience Reflectors update its Tips and propose new Shortcuts based on interaction history. These updates are inspired by human cognitive processes, where episodic memory informs future decisions, and procedural knowledge facilitates efficient task execution. For example, if a user frequently performs a sequence of actions, such as searching for a location and creating a note, the system creates a Shortcut to streamline this process in the future. Mobile-Agent-E balances high-level planning and low-level action precision by incorporating these learnings into its hierarchical framework.

The performance of Mobile-Agent-E has been tested using a new benchmark called Mobile-Eval-E, which evaluates the system’s ability to handle complex real-world tasks. Compared to existing models, Mobile-Agent-E achieves significantly higher satisfaction scores, with a 15% increase in task completion rates. Also, evolved Tips and Shortcuts reduce computational overhead, enabling faster task execution without compromising accuracy. For instance, a single Shortcut that combines actions like “Tap,” “Type,” and “Enter” can save two decision-making iterations, improving efficiency. The system’s hierarchical design enhances error recovery, allowing it to adapt to unforeseen challenges during task execution.

Key takeaways from this research include the following:  

    Mobile-Agent-E features a Manager agent supported by four specialized subordinate agents, enabling efficient task delegation and execution.  The system continuously updates its Tips and Shortcuts, inspired by human cognitive processes, to improve performance and reduce redundant errors.Shortcuts reduce computational overhead, resulting in faster task execution with fewer resources. For example, task completion time decreased by 20% compared to previous models.Mobile-Agent-E achieved a 15% increase in satisfaction scores compared to state-of-the-art models, demonstrating its effectiveness in real-world applications.The system’s capabilities extend to various scenarios, such as planning itineraries, managing notes, and comparing prices across apps, showcasing its versatility and adaptability. 

In conclusion, Mobile-Agent-E bridges the gap between user needs and technological capabilities by addressing critical challenges in task management, planning, and decision-making. Its hierarchical framework and self-evolution capabilities enhance efficiency and set a new benchmark for intelligent mobile assistants. This research highlights the potential of AI-driven solutions to transform human-device interaction, making technology more accessible and intuitive for all users.


Check out the Paper, GitHub Page and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

[Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

The post Mobile-Agent-E: A Hierarchical Multi-Agent Framework Combining Cognitive Science and AI to Redefine Complex Task Handling on Smartphones appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mobile-Agent-E 多代理框架 智能助手 AI 移动设备
相关文章