Getting started with computer use in Amazon Bedrock Agents

Computer use is a breakthrough capability from Anthropic that allows foundation models (FMs) to visually perceive and interpret digital interfaces. This capability enables Anthropic’s Claude models to identify what’s on a screen, understand the context of UI elements, and recognize actions that should be performed such as clicking buttons, typing text, scrolling, and navigating between applications. However, the model itself doesn’t execute these actions—it requires an orchestration layer to safely implement the supported actions.

Today, we’re announcing computer use support within Amazon Bedrock Agents using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude Sonnet 3.7 models on Amazon Bedrock. This integration brings Anthropic’s visual perception capabilities as a managed tool within Amazon Bedrock Agents, providing you with a secure, traceable, and managed way to implement computer use automation in your workflows.

Organizations across industries struggle with automating repetitive tasks that span multiple applications and systems of record. Whether processing invoices, updating customer records, or managing human resource (HR) documents, these workflows often require employees to manually transfer information between different systems – a process that’s time-consuming, error-prone, and difficult to scale.

Traditional automation approaches require custom API integrations for each application, creating significant development overhead. Computer use capabilities change this paradigm by allowing machines to perceive existing interfaces just as humans.

In this post, we create a computer use agent demo that provides the critical orchestration layer that transforms computer use from a perception capability into actionable automation. Without this orchestration layer, computer use would only identify potential actions without executing them. The computer use agent demo powered by Amazon Bedrock Agents provides the following benefits:

Secure execution environment

Comprehensive logging

Detailed tracing capabilities

Simplified testing and experimentation

Seamless orchestration

This integration combines Anthropic’s perceptual understanding of digital interfaces with the orchestration capabilities of Amazon Bedrock Agents, creating a powerful agent for automating complex workflows across applications. Rather than build custom integrations for each system, developers can now create agents that perceive and interact with existing interfaces in a managed, secure way.

With computer use, Amazon Bedrock Agents can automate tasks through basic GUI actions and built-in Linux commands. For example, your agent could take screenshots, create and edit text files, and run built-in Linux commands. Using Amazon Bedrock Agents and compatible Anthropic’s Claude models, you can use the following action groups:

Computer tool

Text editor tool

Bash

Solution overview

An example computer use workflow consists of the following steps:

CreateAgentActionGroup API

return control

You can recreate this example in the us-west-2 AWS Region with the AWS Cloud Development Kit (AWS CDK) by following the instructions in the GitHub repository. This demo deploys a containerized application using AWS Fargate across two Availability Zones in the us-west-2 Region. The infrastructure operates within a virtual private cloud (VPC) containing public subnets in each Availability Zone, with an internet gateway providing external connectivity. The architecture is complemented by essential supporting services, including AWS Key Management Service (AWS KMS) for security and Amazon CloudWatch for monitoring, creating a resilient, serverless container environment that alleviates the need to manage underlying infrastructure while maintaining robust security and high availability.

The following diagram illustrates the solution architecture.

At the core of our solution are two Fargate containers managed through Amazon Elastic Container Service (Amazon ECS), each protected by its own security group. The first is our orchestration container, which not only handles the communication between Amazon Bedrock Agents and end users, but also orchestrates the workflow that enables tool execution. The second is our environment container, which serves as a secure sandbox where the Amazon Bedrock agent can safely run its computer use tools. The environment container has limited access to the rest of the ecosystem and the internet. We utilize service discovery to connect Amazon ECS services with DNS names.

The orchestration container includes the following components:

Streamlit UI

Streamlit UI

Return control loop

The environment container includes the following components:

UI and pre-installed applications

Tool implementation

Quart (RESTful) JSON API

Quart

The following diagram illustrates these components.

Prerequisites

Create an Amazon Bedrock agent with computer use

You can use the following code sample to create a simple Amazon Bedrock agent with computer, bash, and text editor action groups. It is crucial to provide a compatible action group signature when using Anthropic’s Claude 3.5 Sonnet V2 and Anthropic’s Claude 3.7 Sonnet as highlighted here.

Model	Action Group Signature
Anthropic’s Claude 3.5 Sonnet V2	computer_20241022 text_editor_20241022 bash_20241022
Anthropic’s Claude 3.7 Sonnet	computer_20250124 text_editor_20250124 bash_20250124

import boto3import time# Step 1: Create the bedrock agent clientbedrock_agent = boto3.client("bedrock-agent", region_name="us-west-2")# Step 2: Create an agentcreate_agent_response = create_agent_response = bedrock_agent.create_agent(        agentResourceRoleArn=agent_role_arn, # Amazon Bedrock Agent execution role        agentName="computeruse",        description="""Example agent for computer use.                This agent should only operate on               Sandbox environments with limited privileges.""",        foundationModel="us.anthropic.claude-3-7-sonnet-20250219-v1:0",            instruction="""You are computer use agent capable of using Firefox                  web browser for web search.""",)time.sleep(30) # wait for agent to be created# Step 3.1: Create and attach computer action groupbedrock_agent.create_agent_action_group(    actionGroupName="ComputerActionGroup",    actionGroupState="ENABLED",    agentId=create_agent_response["agent"]["agentId"],    agentVersion="DRAFT",    parentActionGroupSignature="ANTHROPIC.Computer",    parentActionGroupSignatureParams={        "type": "computer_20250124",        "display_height_px": "768",        "display_width_px": "1024",        "display_number": "1",    },)# Step 3.2: Create and attach bash action groupbedrock_agent.create_agent_action_group(    actionGroupName="BashActionGroup",    actionGroupState="ENABLED",    agentId=create_agent_response["agent"]["agentId"],    agentVersion="DRAFT",    parentActionGroupSignature="ANTHROPIC.Bash",    parentActionGroupSignatureParams={        "type": "bash_20250124",    },)# Step 3.3: Create and attach text editor action groupbedrock_agent.create_agent_action_group(    actionGroupName="TextEditorActionGroup",    actionGroupState="ENABLED",    agentId=create_agent_response["agent"]["agentId"],    agentVersion="DRAFT",    parentActionGroupSignature="ANTHROPIC.TextEditor",    parentActionGroupSignatureParams={        "type": "text_editor_20250124",    },)# Step 3.4 Create Weather Action Groupbedrock_agent.create_agent_action_group(        actionGroupName="WeatherActionGroup",        agentId=create_agent_response["agent"]["agentId"],        agentVersion="DRAFT",        actionGroupExecutor = {            'customControl': 'RETURN_CONTROL',        },        functionSchema = {            'functions': [                {                    "name": "get_current_weather",                    "description": "Get the current weather in a given location.",                    "parameters": {                        "location": {                            "type": "string",                            "description": "The city, e.g., San Francisco",                            "required": True,                        },                        "unit": {                            "type": "string",                            "description": 'The unit to use, e.g.,                                    fahrenheit or celsius. Defaults to "fahrenheit"',                            "required": False,                        },                    },                    "requireConfirmation": "DISABLED",                }            ]        },)time.sleep(10)# Step 4: Prepare agentbedrock_agent.prepare_agent(agentId=create_agent_response["agent"]["agentId"])

Example use case

In this post, we demonstrate an example where we use Amazon Bedrock Agents with the computer use capability to complete a web form. In the example, the computer use agent can also switch Firefox tabs to interact with a customer relationship management (CRM) agent to get the required information to complete the form. Although this example uses a sample CRM application as the system of record, the same approach works with Salesforce, SAP, Workday, or other systems of record with the appropriate authentication frameworks in place.

In the demonstrated use case, you can observe how well the Amazon Bedrock agent performed with computer use tools. Our implementation completed the customer ID, customer name, and email by visually examining the excel data. However, for the overview, it decided to select the cell and copy the data, because the information wasn’t completely visible on the screen. Finally, the CRM agent was used to get additional information on the customer.

Best practices

The following are some ways you can improve the performance for your use case:

Security Groups

Network Access Control Lists (NACLs)

Amazon Route 53 Resolver DNS Firewall domain lists

AWS Identity and Access Management (IAM)

principle of least privilege

here

user confirmation

Amazon Bedrock Guardrails

Considerations

The computer use feature is made available to you as a beta service as defined in the AWS Service Terms. It is subject to your agreement with AWS and the AWS Service Terms, and the applicable model EULA. Computer use poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the computer use feature to interact with the internet. To minimize risks, consider taking precautions such as:

Operate computer use functionality in a dedicated virtual machine or container with minimal privileges to minimize direct system exploits or accidents To help prevent information theft, avoid giving the computer use API access to sensitive accounts or data Limit the computer use API’s internet access to required domains to reduce exposure to malicious content To enforce proper oversight, keep a human in the loop for sensitive tasks (such as making decisions that could have meaningful real-world consequences) and for anything requiring affirmative consent (such as accepting cookies, executing financial transactions, or agreeing to terms of service)

Any content that you enable Anthropic’s Claude to see or access can potentially override instructions or cause the model to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Anthropic’s Claude from sensitive surfaces, is essential – including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, inform end users of any relevant risks, and obtain their consent as appropriate.

Clean up

When you are done using this solution, make sure to clean up all the resources. Follow the instructions in the provided GitHub repository.

Conclusion

Organizations across industries face significant challenges with cross-application workflows that traditionally require manual data entry or complex custom integrations. The integration of Anthropic’s computer use capability with Amazon Bedrock Agents represents a transformative approach to these challenges.

By using Amazon Bedrock Agents as the orchestration layer, organizations can alleviate the need for custom API development for each application, benefit from comprehensive logging and tracing capabilities essential for enterprise deployment, and implement automation solutions quickly.

As you begin exploring computer use with Amazon Bedrock Agents, consider workflows in your organization that could benefit from this approach. From invoice processing to customer onboarding, HR documentation to compliance reporting, the potential applications are vast and transformative.

We’re excited to see how you will use Amazon Bedrock Agents with the computer use capability to securely streamline operations and reimagine business processes through AI-driven automation.

Resources

To learn more, refer to the following resources:

Computer use with Amazon Bedrock Agents guide

Computer use with Amazon Bedrock Agents implementation

Computer use with Anthropic’s Claude implementation

Computer use with Anthropic guide

Amazon Bedrock Agent Samples

About the Authors

Eashan Kaushik is a Specialist Solutions Architect AI/ML at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Maira Ladeira Tanke is a Tech Lead for Agentic workloads in Amazon Bedrock at AWS, where she enables customers on their journey to develop autonomous AI systems. With over 10 years of experience in AI/ML. At AWS, Maira partners with enterprise customers to accelerate the adoption of agentic applications using Amazon Bedrock, helping organizations harness the power of foundation models to drive innovation and business transformation. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Adarsh Srikanth is a Software Development Engineer at Amazon Bedrock, where he develops AI agent services. He holds a master’s degree in computer science from USC and brings three years of industry experience to his role. He spends his free time exploring national parks, discovering new hiking trails, and playing various racquet sports.

Abishek Kumar is a Senior Software Engineer at Amazon, bringing over 6 years of valuable experience across both retail and AWS organizations. He has demonstrated expertise in developing generative AI and machine learning solutions, specifically contributing to key AWS services including SageMaker Autopilot, SageMaker Canvas, and AWS Bedrock Agents. Throughout his career, Abishek has shown passion for solving complex problems and architecting large-scale systems that serve millions of customers worldwide. When not immersed in technology, he enjoys exploring nature through hiking and traveling adventures with his wife.

Krishna Gourishetti is a Senior Software Engineer for the Bedrock Agents team in AWS. He is passionate about building scalable software solutions that solve customer problems. In his free time, Krishna loves to go on hikes.

Solution overview

Prerequisites

Create an Amazon Bedrock agent with computer use

Example use case

Best practices

Considerations

Clean up

Conclusion

About the Authors

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签