AWS Machine Learning Blog 03月15日 02:23
Getting started with computer use in Amazon Bedrock Agents
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了亚马逊云科技(AWS)Bedrock Agents如何通过集成Anthropic的Claude模型,实现计算机使用(Computer Use)能力,从而自动化跨多个应用程序的重复性任务。该方案允许AI模型视觉感知和理解数字界面,并执行如点击按钮、输入文本等操作。通过Amazon Bedrock Agents提供安全、可追踪和可管理的自动化方式,企业无需为每个应用构建自定义API集成,即可实现工作流程的自动化。文章还提供了一个使用AWS CDK创建计算机使用代理的演示,展示了如何在沙盒环境中安全地执行计算机使用工具,并提供了详细的架构和组件说明。

🤖计算机使用能力使AI模型能够像人类一样感知和理解数字界面,允许它们识别屏幕上的内容、理解UI元素的上下文,并识别应执行的操作,例如点击按钮、输入文本和在应用程序之间导航,从而实现工作流程的自动化。

🛡️集成的关键优势在于其安全执行环境,计算机使用工具在一个沙盒环境中运行,该环境对AWS生态系统和网络的访问受到限制,同时提供全面的日志记录和详细的跟踪功能,增强了审计和调试能力。

🛠️Amazon Bedrock Agents支持三种主要的操作组:计算机工具(用于与用户界面交互)、文本编辑器工具(用于编辑和操作文件)以及Bash(允许执行内置的Linux命令),这些工具共同为自动化任务提供了灵活且强大的基础。

☁️通过使用Amazon Bedrock Agents和兼容的Anthropic Claude模型,用户可以使用自然语言描述代理应该做什么以及如何与用户交互,然后将支持的计算机使用操作组添加到代理,从而简化了代理的创建和配置过程。

Computer use is a breakthrough capability from Anthropic that allows foundation models (FMs) to visually perceive and interpret digital interfaces. This capability enables Anthropic’s Claude models to identify what’s on a screen, understand the context of UI elements, and recognize actions that should be performed such as clicking buttons, typing text, scrolling, and navigating between applications. However, the model itself doesn’t execute these actions—it requires an orchestration layer to safely implement the supported actions.

Today, we’re announcing computer use support within Amazon Bedrock Agents using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude Sonnet 3.7 models on Amazon Bedrock. This integration brings Anthropic’s visual perception capabilities as a managed tool within Amazon Bedrock Agents, providing you with a secure, traceable, and managed way to implement computer use automation in your workflows.

Organizations across industries struggle with automating repetitive tasks that span multiple applications and systems of record. Whether processing invoices, updating customer records, or managing human resource (HR) documents, these workflows often require employees to manually transfer information between different systems – a process that’s time-consuming, error-prone, and difficult to scale.

Traditional automation approaches require custom API integrations for each application, creating significant development overhead. Computer use capabilities change this paradigm by allowing machines to perceive existing interfaces just as humans.

In this post, we create a computer use agent demo that provides the critical orchestration layer that transforms computer use from a perception capability into actionable automation. Without this orchestration layer, computer use would only identify potential actions without executing them. The computer use agent demo powered by Amazon Bedrock Agents provides the following benefits:

This integration combines Anthropic’s perceptual understanding of digital interfaces with the orchestration capabilities of Amazon Bedrock Agents, creating a powerful agent for automating complex workflows across applications. Rather than build custom integrations for each system, developers can now create agents that perceive and interact with existing interfaces in a managed, secure way.

With computer use, Amazon Bedrock Agents can automate tasks through basic GUI actions and built-in Linux commands. For example, your agent could take screenshots, create and edit text files, and run built-in Linux commands. Using Amazon Bedrock Agents and compatible Anthropic’s Claude models, you can use the following action groups:

Solution overview

An example computer use workflow consists of the following steps:

    Create an Amazon Bedrock agent and use natural language to describe what the agent should do and how it should interact with users, for example: “You are computer use agent capable of using Firefox web browser for web search.” Add the Amazon Bedrock Agents supported computer use action groups to your agent using CreateAgentActionGroup API. Invoke the agent with a user query that requires computer use tools, for example, “What is Amazon Bedrock, can you search the web?” The Amazon Bedrock agent uses the tool definitions at its disposal and decides to use the computer action group to click a screenshot of the environment. Using the return control capability of Amazon Bedrock Agents, the agent the responds with the tool or tools that it wants to execute. The return control capability is required for using computer use with Amazon Bedrock Agents. The workflow parses the agent response and executes the tool returned in a sandbox environment. The output is given back to the Amazon Bedrock agent for further processing. The Amazon Bedrock agent continues to respond with tools at its disposal until the task is complete.

You can recreate this example in the us-west-2 AWS Region with the AWS Cloud Development Kit (AWS CDK) by following the instructions in the GitHub repository. This demo deploys a containerized application using AWS Fargate across two Availability Zones in the us-west-2 Region. The infrastructure operates within a virtual private cloud (VPC) containing public subnets in each Availability Zone, with an internet gateway providing external connectivity. The architecture is complemented by essential supporting services, including AWS Key Management Service (AWS KMS) for security and Amazon CloudWatch for monitoring, creating a resilient, serverless container environment that alleviates the need to manage underlying infrastructure while maintaining robust security and high availability.

The following diagram illustrates the solution architecture.

At the core of our solution are two Fargate containers managed through Amazon Elastic Container Service (Amazon ECS), each protected by its own security group. The first is our orchestration container, which not only handles the communication between Amazon Bedrock Agents and end users, but also orchestrates the workflow that enables tool execution. The second is our environment container, which serves as a secure sandbox where the Amazon Bedrock agent can safely run its computer use tools. The environment container has limited access to the rest of the ecosystem and the internet. We utilize service discovery to connect Amazon ECS services with DNS names.

The orchestration container includes the following components:

The environment container includes the following components:

The following diagram illustrates these components.

Prerequisites

    AWS Command Line Interface (CLI), follow instructions here. Make sure to setup credentials, follow instructions here. Require Python 3.11 or later. Require Node.js 14.15.0 or later. AWS CDK CLI, follow instructions here. Enable model access for Anthropic’s Claude Sonnet 3.5 V2 and for Anthropic’s Claude Sonnet 3.7. Boto3 version >= 1.37.10.

Create an Amazon Bedrock agent with computer use

You can use the following code sample to create a simple Amazon Bedrock agent with computer, bash, and text editor action groups. It is crucial to provide a compatible action group signature when using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude 3.7 Sonnet as highlighted here.

Model Action Group Signature
Anthropic’s Claude 3.5 Sonnet V2 computer_20241022
text_editor_20241022
bash_20241022
Anthropic’s Claude 3.7 Sonnet computer_20250124
text_editor_20250124
bash_20250124
import boto3import time# Step 1: Create the bedrock agent clientbedrock_agent = boto3.client("bedrock-agent", region_name="us-west-2")# Step 2: Create an agentcreate_agent_response = create_agent_response = bedrock_agent.create_agent(        agentResourceRoleArn=agent_role_arn, # Amazon Bedrock Agent execution role        agentName="computeruse",        description="""Example agent for computer use.                This agent should only operate on               Sandbox environments with limited privileges.""",        foundationModel="us.anthropic.claude-3-7-sonnet-20250219-v1:0",            instruction="""You are computer use agent capable of using Firefox                  web browser for web search.""",)time.sleep(30) # wait for agent to be created# Step 3.1: Create and attach computer action groupbedrock_agent.create_agent_action_group(    actionGroupName="ComputerActionGroup",    actionGroupState="ENABLED",    agentId=create_agent_response["agent"]["agentId"],    agentVersion="DRAFT",    parentActionGroupSignature="ANTHROPIC.Computer",    parentActionGroupSignatureParams={        "type": "computer_20250124",        "display_height_px": "768",        "display_width_px": "1024",        "display_number": "1",    },)# Step 3.2: Create and attach bash action groupbedrock_agent.create_agent_action_group(    actionGroupName="BashActionGroup",    actionGroupState="ENABLED",    agentId=create_agent_response["agent"]["agentId"],    agentVersion="DRAFT",    parentActionGroupSignature="ANTHROPIC.Bash",    parentActionGroupSignatureParams={        "type": "bash_20250124",    },)# Step 3.3: Create and attach text editor action groupbedrock_agent.create_agent_action_group(    actionGroupName="TextEditorActionGroup",    actionGroupState="ENABLED",    agentId=create_agent_response["agent"]["agentId"],    agentVersion="DRAFT",    parentActionGroupSignature="ANTHROPIC.TextEditor",    parentActionGroupSignatureParams={        "type": "text_editor_20250124",    },)# Step 3.4 Create Weather Action Groupbedrock_agent.create_agent_action_group(        actionGroupName="WeatherActionGroup",        agentId=create_agent_response["agent"]["agentId"],        agentVersion="DRAFT",        actionGroupExecutor = {            'customControl': 'RETURN_CONTROL',        },        functionSchema = {            'functions': [                {                    "name": "get_current_weather",                    "description": "Get the current weather in a given location.",                    "parameters": {                        "location": {                            "type": "string",                            "description": "The city, e.g., San Francisco",                            "required": True,                        },                        "unit": {                            "type": "string",                            "description": 'The unit to use, e.g.,                                    fahrenheit or celsius. Defaults to "fahrenheit"',                            "required": False,                        },                    },                    "requireConfirmation": "DISABLED",                }            ]        },)time.sleep(10)# Step 4: Prepare agentbedrock_agent.prepare_agent(agentId=create_agent_response["agent"]["agentId"])

Example use case

In this post, we demonstrate an example where we use Amazon Bedrock Agents with the computer use capability to complete a web form. In the example, the computer use agent can also switch Firefox tabs to interact with a customer relationship management (CRM) agent to get the required information to complete the form. Although this example uses a sample CRM application as the system of record, the same approach works with Salesforce, SAP, Workday, or other systems of record with the appropriate authentication frameworks in place.

In the demonstrated use case, you can observe how well the Amazon Bedrock agent performed with computer use tools. Our implementation completed the customer ID, customer name, and email by visually examining the excel data. However, for the overview, it decided to select the cell and copy the data, because the information wasn’t completely visible on the screen. Finally, the CRM agent was used to get additional information on the customer.

Best practices

The following are some ways you can improve the performance for your use case:

Considerations

The computer use feature is made available to you as a beta service as defined in the AWS Service Terms. It is subject to your agreement with AWS and the AWS Service Terms, and the applicable model EULA. Computer use poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the computer use feature to interact with the internet. To minimize risks, consider taking precautions such as:

Any content that you enable Anthropic’s Claude to see or access can potentially override instructions or cause the model to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Anthropic’s Claude from sensitive surfaces, is essential – including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, inform end users of any relevant risks, and obtain their consent as appropriate.

Clean up

When you are done using this solution, make sure to clean up all the resources. Follow the instructions in the provided GitHub repository.

Conclusion

Organizations across industries face significant challenges with cross-application workflows that traditionally require manual data entry or complex custom integrations. The integration of Anthropic’s computer use capability with Amazon Bedrock Agents represents a transformative approach to these challenges.

By using Amazon Bedrock Agents as the orchestration layer, organizations can alleviate the need for custom API development for each application, benefit from comprehensive logging and tracing capabilities essential for enterprise deployment, and implement automation solutions quickly.

As you begin exploring computer use with Amazon Bedrock Agents, consider workflows in your organization that could benefit from this approach. From invoice processing to customer onboarding, HR documentation to compliance reporting, the potential applications are vast and transformative.

We’re excited to see how you will use Amazon Bedrock Agents with the computer use capability to securely streamline operations and reimagine business processes through AI-driven automation.

Resources

To learn more, refer to the following resources:


About the Authors

Eashan Kaushik is a Specialist Solutions Architect AI/ML at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Maira Ladeira Tanke is a Tech Lead for Agentic workloads in Amazon Bedrock at AWS, where she enables customers on their journey to develop autonomous AI systems. With over 10 years of experience in AI/ML. At AWS, Maira partners with enterprise customers to accelerate the adoption of agentic applications using Amazon Bedrock, helping organizations harness the power of foundation models to drive innovation and business transformation. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Adarsh Srikanth is a Software Development Engineer at Amazon Bedrock, where he develops AI agent services. He holds a master’s degree in computer science from USC and brings three years of industry experience to his role. He spends his free time exploring national parks, discovering new hiking trails, and playing various racquet sports.

Abishek Kumar is a Senior Software Engineer at Amazon, bringing over 6 years of valuable experience across both retail and AWS organizations. He has demonstrated expertise in developing generative AI and machine learning solutions, specifically contributing to key AWS services including SageMaker Autopilot, SageMaker Canvas, and AWS Bedrock Agents. Throughout his career, Abishek has shown passion for solving complex problems and architecting large-scale systems that serve millions of customers worldwide. When not immersed in technology, he enjoys exploring nature through hiking and traveling adventures with his wife.

Krishna Gourishetti is a Senior Software Engineer for the Bedrock Agents team in AWS. He is passionate about building scalable software solutions that solve customer problems. In his free time, Krishna loves to go on hikes.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock Agents Anthropic 计算机使用 自动化
相关文章