Unite.AI 01月12日
From Intent to Execution: How Microsoft is Transforming Large Language Models into Action-Oriented AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

微软正在将大型语言模型(LLMs)转变为行动导向的AI智能体,使其能够规划、分解任务并进行实际交互。这弥补了LLMs在理解和执行任务之间的差距,使其不仅能理解用户意图,还能将其转化为实际行动。微软通过收集数据、训练模型、离线测试、集成到真实系统以及进行实际测试等步骤,使LLMs具备了理解用户意图、执行具体操作、适应变化以及专注于特定任务的能力。UFO Agent是微软开发的一个实例,展示了如何将LLMs应用于实际Windows环境中执行任务。尽管面临可扩展性、安全性和伦理挑战,微软正致力于提高效率,扩展应用场景,并维护伦理标准,以实现AI更实际、更适应和更具行动导向的未来。

🎯 LLMs需要具备理解用户意图的能力,即使输入模糊或不完整,也能通过多轮对话来细化意图,确保在采取行动前准确理解用户需求。

⚙️ LLMs需要将理解的任务转化为可执行的步骤,例如点击按钮、调用API或控制物理设备,并能根据环境变化和挑战调整操作。

🔄 LLMs需要具备适应变化的能力,当出现问题时,能够预测、调整步骤并找到替代方案,确保任务顺利完成。

🔬 微软通过收集任务计划和行动数据来训练LLMs,使其能够分解任务并执行具体操作。通过多轮训练和强化学习,LLMs能够提高问题解决能力和决策能力。

🖥️ 微软开发的UFO Agent展示了行动导向AI如何在Windows环境中执行任务,如高亮文本、保存文件等。它使用Windows UI Automation API来识别和操作界面元素。

Large Language Models (LLMs) have changed how we handle natural language processing. They can answer questions, write code, and hold conversations. Yet, they fall short when it comes to real-world tasks. For example, an LLM can guide you through buying a jacket but can’t place the order for you. This gap between thinking and doing is a major limitation. People don’t just need information; they want results.

To bridge this gap, Microsoft is turning LLMs into action-oriented AI agents. By enabling them to plan, decompose tasks, and engage in real-world interactions, they empower LLMs to effectively manage practical tasks. This shift has the potential to redefine what LLMs can do, turning them into tools that automate complex workflows and simplify everyday tasks. Let’s look at what’s needed to make this happen and how Microsoft is approaching the problem.

What LLMs Need to Act

For LLMs to perform tasks in the real world, they need to go beyond understanding text. They must interact with digital and physical environments while adapting to changing conditions. Here are some of the capabilities they need:

    Understanding User Intent

To act effectively, LLMs need to understand user requests. Inputs like text or voice commands are often vague or incomplete. The system must fill in the gaps using its knowledge and the context of the request. Multi-step conversations can help refine these intentions, ensuring the AI understands before taking action.

    Turning Intentions into Actions

After understanding a task, the LLMs must convert it into actionable steps. This might involve clicking buttons, calling APIs, or controlling physical devices. The LLMs need to modify its actions to the specific task, adapting to the environment and solving challenges as they arise.

    Adapting to Changes

Real world tasks don’t always go as planned. LLMs need to anticipate problems, adjust steps, and find alternatives when issues arise. For instance, if a necessary resource isn’t available, the system should find another way to complete the task. This flexibility ensures the process doesn’t stall when things change.

    Specializing in Specific Tasks

While LLMs are designed for general use, specialization makes them more efficient. By focusing on specific tasks, these systems can deliver better results with fewer resources. This is especially important for devices with limited computing power, like smartphones or embedded systems.

By developing these skills, LLMs can move beyond just processing information. They can take meaningful actions, paving the way for AI to integrate seamlessly into everyday workflows.

How Microsoft is Transforming LLMs

Microsoft’s approach to creating action-oriented AI follows a structured process. The key objective is to enable LLMs to understand commands, plan effectively, and take action. Here’s how they’re doing it:

Step 1: Collecting and Preparing Data

In the first phrase, they collected data related to their specific use cases: UFO Agent (described below). The data includes user queries, environmental details, and task-specific actions. Two different types of data are collected in this phase: firstly, they collected task-plan data helping LLMs to outline high-level steps required to complete a task. For example, “Change font size in Word” might involve steps like selecting text and adjusting the toolbar settings. Secondly, they collected task-action data, enabling LLMs to translate these steps into precise instructions, like clicking specific buttons or using keyboard shortcuts.

This combination gives the model both the big picture and the detailed instructions it needs to perform tasks effectively.

Step 2: Training the Model

Once the data is collected, LLMs are refined through multiple training sessions. In the first step, LLMs are trained for task-planning by teaching them how to break down user requests into actionable steps. Expert-labeled data is then used to teach them how to translate these plans into specific actions. To further enhanced their problem-solving capabilities, LLMs have engaged in self-boosting exploration process which empower them to tackle unsolved tasks and generate new examples for continuous learning. Finally, reinforcement learning is applied, using feedback from successes and failures to further improved their decision-making.

Step 3: Offline Testing

After training, the model is tested in controlled environments to ensure reliability. Metrics like Task Success Rate (TSR) and Step Success Rate (SSR) are used to measure performance. For example, testing a calendar management agent might involve verifying its ability to schedule meetings and send invitations without errors.

Step 4: Integration into Real Systems

Once validated, the model is integrated into an agent framework. This allowed it to interact with real-world environments, like clicking buttons or navigating menus. Tools like UI Automation APIs helped the system identify and manipulate user interface elements dynamically.

For example, if tasked with highlighting text in Word, the agent identifies the highlight button, selects the text, and applies formatting. A memory component could help LLM to keeps track of past actions, enabling it adapting to new scenarios.

Step 5: Real-World Testing

The final step is online evaluation. Here, the system is tested in real-world scenarios to ensure it can handle unexpected changes and errors. For example, a customer support bot might guide users through resetting a password while adapting to incorrect inputs or missing information. This testing ensures the AI is robust and ready for everyday use.

A Practical Example: The UFO Agent

To showcase how action-oriented AI works, Microsoft developed the UFO Agent. This system is designed to execute real-world tasks in Windows environments, turning user requests into completed actions.

At its core, the UFO Agent uses a LLM to interpret requests and plan actions. For example, if a user says, “Highlight the word ‘important’ in this document,” the agent interacts with Word to complete the task. It gathers contextual information, like the positions of UI controls, and uses this to plan and execute actions.

The UFO Agent relies on tools like the Windows UI Automation (UIA) API. This API scans applications for control elements, such as buttons or menus. For a task like “Save the document as PDF,” the agent uses the UIA to identify the “File” button, locate the “Save As” option, and execute the necessary steps. By structuring data consistently, the system ensures smooth operation from training to real-world application.

Overcoming Challenges

While this is an exciting development, creating action-oriented AI comes with challenges. Scalability is a major issue. Training and deploying these models across diverse tasks require significant resources. Ensuring safety and reliability is equally important. Models must perform tasks without unintended consequences, especially in sensitive environments. And as these systems interact with private data, maintaining ethical standards around privacy and security is also crucial.

Microsoft’s roadmap focuses on improving efficiency, expanding use cases, and maintaining ethical standards. With these advancements, LLMs could redefine how AI interacts with the world, making them more practical, adaptable, and action-oriented.

The Future of AI

Transforming LLMs into action-oriented agents could be a game-changer. These systems can automate tasks, simplify workflows, and make technology more accessible. Microsoft’s work on action-oriented AI and tools like the UFO Agent is just the beginning. As AI continues to evolve, we can expect smarter, more capable systems that don’t just interact with us—they get jobs done.

The post From Intent to Execution: How Microsoft is Transforming Large Language Models into Action-Oriented AI appeared first on Unite.AI.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 行动导向AI UFO Agent 任务自动化 微软
相关文章