AWS Machine Learning Blog 07月29日 02:03
Amazon Nova Act SDK (preview): Path to production for browser automation agents
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon Nova Act SDK现已推出限时预览版,旨在帮助开发者构建可靠的浏览器自动化代理。该SDK集成了AWS IAM、Amazon S3及Amazon Bedrock AgentCore Browser Tool等关键AWS服务,为代理的安全凭证管理、数据存储与策略控制提供了强大的支持,并实现了可扩展的云端浏览器执行。文章深入探讨了传统自动化工作流面临的挑战,强调了基于大型语言模型(LLMs)的Agentic AI在处理动态、多步骤任务方面的优势。Nova Act SDK通过精调的Amazon Nova Act模型,实现了高精度的步骤执行,并支持Python和自然语言编写代理,提供实时调试和CI/CD集成,旨在加速企业级自动化应用的开发与部署,提升效率,降低维护成本。

🚀 **Agentic AI引领自动化新范式**:文章指出,传统自动化框架在动态网页环境下易因结构变化而失效,且难以规模化。Agentic AI利用LLMs的模式识别能力,能够理解并执行复杂的、多步骤的浏览器交互任务,实现从“理解”到“执行”的飞跃,是LLMs在企业级应用中的重要延伸。Amazon Nova Act SDK正是Agentic AI的关键实现工具,专为代理AI设计和优化,通过强化学习和大量浏览器交互数据训练,能够精准执行工作流。

🛠️ **AWS深度集成,加速生产化部署**:新版SDK通过与AWS IAM、Amazon S3以及Amazon Bedrock AgentCore Browser Tool的集成,为开发者提供了企业级安全、可扩展的云端浏览器执行环境。IAM确保了代理的安全凭证管理,S3提供了数据存储和策略控制,而Bedrock AgentCore Browser Tool则实现了大规模、安全的云端浏览器交互。这些集成使得代理能够从原型快速走向生产环境,并支持Git-based CI/CD管道,显著缩短了上市时间。

✅ **高可靠性与灵活性,应对复杂场景**:Amazon Nova Act SDK通过将复杂工作流分解为原子命令,并允许开发者插入Python代码进行优化(如并行处理、断言),实现了超过90%的自动化准确率。它能够智能处理动态UI变化、弹出窗口和复杂的表单填写,甚至支持Playwright作为备用方案处理敏感任务,如密码输入。这种高度的可靠性和灵活性使其能够胜任自动化质量保证、复杂表单处理、客户支持流程自动化、凭证验证等高风险、高频率的业务场景。

💡 **广泛应用场景,赋能各行业**:文章列举了多个行业应用案例,包括医疗保健领域的自动化福利申请、金融服务领域的碎片化数据整理、旅游行业的支付表单填写、以及公共部门的软件测试等。例如,Rackspace Technology利用SDK简化公共项目注册流程,Navan通过自动化支付表单填写提升了客户服务效率,Automation Anywhere则将其应用于高风险的专业资质验证。这些案例证明了SDK在自动化数据录入、客户支持增强、高风险管理工作流以及UX/QA测试方面的巨大潜力,能够显著提升工作效率和准确性。

🚀 **安全、可扩展的企业级解决方案**:SDK在设计上充分考虑了企业级应用的需求,支持macOS、Linux、Windows等多种操作系统,提供隔离的运行时环境和敏感数据加密。通过IAM进行访问控制,确保了代理的安全性。同时,SDK支持实时调试、可复用的代理模块以及并行执行,为构建安全、可扩展的Agentic AI系统提供了坚实的基础,帮助企业将自动化能力推向新的高度。

In early 2025, we introduced the Amazon Nova Act SDK as a research preview to help developers build agents that reliably complete tasks in a web browser. Now, we are excited to work with customers to take their agents to production in a limited preview, using new AWS integrations including AWS Identity and Access Management (IAM) for secure credentialing, Amazon Simple Storage Service (Amazon S3) for data storage and policy control, and the new Amazon Bedrock AgentCore Browser Tool for scalable, cloud-based browser execution.

In this post, we walk through what makes the Amazon Nova Act SDK unique, how it works, and how teams across industries are already using it to automate browser-based workflows at scale.

Challenges with traditional automated business workflows

Many day-to-day business operations require a browser, such as submitting time-off requests, processing invoices, accessing vendor portals, or reviewing dashboards. Lack of API coverage often means workflows are done manually: teams copy-paste data across tabs, follow multi-step flows, and click through countless interfaces to get work done.

Traditional rules-based browser automation frameworks often face challenges in dynamic web environments. Teams can spend more time on ongoing maintenance than on building new automations, because changes in page structure (for example, newly added form fields or dropdown options) break brittle selectors. Most importantly, these frameworks are difficult to scale. If one use case is performed on 50 different sites (for example, professional license verification on state websites), teams must build 50 site-specific automations, because rules-based frameworks don’t generalize.

As humans, our ability to perform tasks adapts across different tools and interfaces. For example, once you know how to draft an email in Outlook, you can easily do the same in Gmail—even if you’ve never used it before. Large language models (LLMs), trained on millions of examples of UIs, offer the potential to create a similar type of pattern recognition for AI agents. They’ve brought us this far—powering chat, summarization, coding copilots, and more—by interpreting language, following instructions, and reasoning across domains. Now, we’re entering the next phase of generative AI: one centered on action. Agentic AI builds on the foundation of LLMs to move from understanding to execution. These systems are designed to complete dynamic, multi-step workflows—like filling out complex forms, interacting with evolving UI elements, or performing real-world business tasks at scale. Agentic AI doesn’t replace LLMs—it extends them, unlocking new automation capabilities that bring us closer to real task completion in enterprise environments.

Agentic AI with the Amazon Nova Act SDK

With the Amazon Nova Act SDK, you can build and deploy reliable browser agents powered by the Amazon Nova Act model—purpose-built and fine-tuned for agentic AI. Trained with reinforcement learning and extensive in-domain browser interaction data, it executes step-by-step workflows with precision. With this latest version, we’ve extended those capabilities with AWS integrations so you can take your agents from prototype to production. You can install the SDK with a single command, write agents in Python and natural language, debug in real time, and integrate directly into continuous integration and delivery (CI/CD) pipelines. With enterprise-grade security, observability, and infrastructure now available through AWS, the Amazon Nova Act SDK provides a fast, flexible path for teams looking to build agents that act—and deliver—at scale. You can use the Amazon Nova Act SDK to automate real-world workflows where traditional scripts or general-purpose models aren’t reliable or scalable enough. You can install it with a single command, write agents using a combination of Python and natural language, debug while the workflow runs, and deploy through CI/CD pipelines.

The Amazon Nova Act SDK also integrates with the new Amazon Bedrock AgentCore Browser Tool—a fast, secure, cloud-based browser that enables AI agents to interact with websites at scale. It includes enterprise-grade security features, including virtual machine-level isolation and federated identity integration. The tool offers built-in observability through live viewing, AWS CloudTrail logging, and session replay to troubleshoot, maintain quality, and support compliance.

Benefits of the Amazon Nova Act SDK

The Amazon Nova Act SDK is reliable, fast to deploy, and built for secure, large-scale browser automation use cases. In this section, we discuss some of the benefits of the SDK in more detail.

Reliability: Build robust browser automation with high accuracy and repeatability

With the Amazon Nova Act SDK, developers can break down complex workflows into reliable atomic commands (for example, collect all form elements of a webpage and return a string with all required fields of the form). It supports the addition of detailed instructions to refine those commands when needed (for example, dismiss any popup banners), the ability to call APIs, and the option to alternate direct browser manipulation through Playwright to improve reliability (for example, for entering passwords). Developers can interleave Python code—such as tests, breakpoints, assertions, or thread pools for parallelization—to optimize performance, especially because even the fastest agents are constrained by webpage load times. With this latest version, the Amazon Nova Act SDK is already demonstrating over 90% reliability across early enterprise workflows, including automated quality assurance, complex form handling, and process execution. Improvements to reasoning and recovery help agents adapt to changing UIs and complete complex sequences consistently and accurately.

Speed-to-market: Move from prototype to production in days—not weeks

The Amazon Nova Act SDK is designed to help you build automation quickly, without relying on brittle scripts. You can install the SDK with a single command. You can define agents using Python, natural language, or both. You can debug flows while they run, inspect the DOM, pause between steps, and iterate rapidly. The Amazon Nova Act SDK supports the following features:

You don’t have to change your infrastructure or rebuild your internal tools. Agents built with Amazon Nova Act fit into existing dev workflows and allow you to move from experimentation to production quickly.

Security: Deploy automations you can trust—powered by AWS

The Amazon Nova Act SDK integrates with IAM for access control, and access to the model is managed just like access from other AWS services. It supports execution on macOS, Linux, Windows, and WSL2. Runtime environments are isolated, and encryption is supported for sensitive inputs and outputs. The Amazon Nova Act SDK was designed to work inside enterprise environments—with the reliability, observability, and security that production systems require.

See it in action: Automating information gathering to help streamline financial decisions

In financial services—especially investment banking, M&A advisory, and strategic research—success often depends on how fast and accurately teams can turn fragmented public data into actionable insight. The following demo shows the Amazon Nova Act SDK in action.

Where the Amazon Nova Act SDK can make an impact

Browser-based workflows are common in today’s businesses, yet many remain manual, repetitive, and prone to error. The Amazon Nova Act SDK helps organizations automate these tasks, freeing up teams to focus on higher-value work, improve accuracy, and reduce operational delays. Its reliability makes it a fit across industries and use cases. In this section, we provide some examples of what our early customers are building.

Automated data entry and form filling

The Amazon Nova Act SDK reduces repetitive manual input across web-based systems—like CRMs, HR tools, and finance platforms—by automating form submissions, uploads, and updates. In healthcare, staff assist members with complex, state-specific benefit applications. Public sector caseworkers also re-enter household data across multiple systems. The Amazon Nova Act SDK handles these dynamic flows reliably—navigating shifting fields, dropdowns, and popups without brittle scripts or custom code.

Rackspace Technology, a leading hybrid and AI solutions provider, is working with Alvee Health to automatically register members for public benefits using the Amazon Nova Act SDK. “Many registration forms for public programs are long and confusing, so members often don’t apply for the help they need,” said Nicole Cook, CEO at Alvee. “With the Amazon Nova Act SDK and harnessing information already in Alvee’s system, we’re not just simplifying paperwork—we’re helping provide timely, accurate access to the resources that support healthier lives. We expect this innovation to increase successful benefit registrations by 30%, and improve overall case load by up to tenfold, allowing healthcare providers to focus more on patient care and less on administration. This is a prime example of how AI can be used to support well-being and improve overall health for communities.”

Customer support augmentation

Customer support teams across retail, travel, and software as a service (SaaS) often move between internal tools and third-party portals to resolve tickets. For example, a retail associate might submit a return on a partner site. A travel agent might log in to an airline dashboard to request compensation. A support rep might reset a license key in a customer admin console. The Amazon Nova Act SDK automates these browser-based tasks, helping agents stay focused on customer conversations while backend steps are executed reliably and at speed.

Navan, a leading travel and expense management platform, uses the Amazon Nova Act SDK to simplify its travel agents’ workflows by automating how they fill out payment forms across a wide range of vendors.

Yuval Refua, SVP of Product, said, “Adding the Amazon Nova Act SDK to our agents’ workflows has helped us reduce repetitive tasks—an essential step in scaling our operations to serve more customers. We tried other computer use tools, and Amazon Nova Act’s reliability and flexibility enabled a single script work across diverse payment forms from a range of hotel brands. We’re now expanding this automation to cover even more vendors, which we expect will increase our operational capacity and help us meet growing customer demand more efficiently.”

Automating high-stakes administrative workflows

Credential verification, identity checks, and other compliance-heavy tasks often involve navigating hundreds of third-party portals with inconsistent layouts. The Amazon Nova Act SDK makes it possible to automate these workflows with high accuracy, flexibility, and full control—helping teams scale while maintaining precision.

Automation Anywhere, a global leader in Agentic Process Automation (APA), is expanding its automation capabilities through the Amazon Nova Act SDK, starting with professional credential verification—a high-stakes, repetitive task that’s essential for compliance, member safety, and day-to-day operations.

“By deeply integrating the Amazon Nova Act SDK into our Process Reasoning Engine (PRE), we’ve unlocked a major leap forward in computer use for enterprise automation,” said Adi Kuruganti, Chief Product Officer at Automation Anywhere. “Our goal-oriented AI agents don’t just mimic clicks, they reason through UI-based processes in real time, navigating complex websites with human-like expertise. This opens the door to automating previously out-of-reach use cases like healthcare program enrollment testing, where accuracy and scale are essential.”

UX and QA testing across dynamic interfaces

UX and QA testing often involves simulating real user interactions on frequently changing websites—especially in sectors like banking, insurance, and government. With the Amazon Nova Act SDK, teams can write and update tests using natural language or Python, adapting quickly to UI changes without brittle selectors or manual rewrites.

Tyler Technologies, a leading provider of integrated software and technology services to the public sector, is using the Amazon Nova Act SDK to automate software testing and improve the reliability of its releases. “Amazon Nova Act’s natural-language interface lets us convert our manual test plans into automated suites in minutes—without writing a single line of code, saving us hundreds of hours while expanding test coverage and increasing product quality.” said Franklin Williams, President of Data & Insights at Tyler Technologies. “We’re now looking to expand the use of Nova Act SDK across our portfolio.”

What’s next for the Amazon Nova Act SDK

We’re working closely with early AWS customers to inform our roadmap. Although today’s focus is on browser-based workflows, the Amazon Nova Act SDK is part of a broader effort to build agents that can operate reliably across diverse environments. We’re continuing to expand the model’s reach beyond the web, applying reinforcement learning to more complex, real-world tasks. We’re also deepening integration across the AWS ecosystem to help developers move faster—from prototyping to deploying secure, scalable agentic systems.

Get started with the Amazon Nova Act SDK

If you’re a technical leader or developer and want to start prototyping with the research preview of the Amazon Nova Act SDK, visit Amazon Nova Act. You will get access to early tooling designed for reliable, step-by-step browser automation—built for real-world workflows, not just demos.

The Amazon AGI Lab is Amazon’s applied research group focused on building useful AI agents that can take real-world actions in digital and physical environments. Their work spans LLMs, reinforcement learning, world modeling, and more. To learn more and keep up with their latest innovations, visit Amazon AGI Labs.

Contact us to express interest in working with us to productionize your agent (limited preview).


About the authors

Lori Knapp is a Principal Product Manager with Amazon Nova. She leads product efforts to define how foundation models can power intelligent agents across diverse real-world use cases. Prior to this role, Lori’s experience spanned scaling adaptive voice experiences at Alexa, product strategy at Microsoft, and behavioral science consulting. Outside of work, she enjoys exploring new cities, hosting dinner parties, and solving crossword puzzles.

Tara Raj is an Engineering Manager at Amazon working on Nova Act. In her current role she’s focused on developer experience, from building nova.amazon.com/act to the Nova Act SDK with the software engineers on her team to driving adoption of Amazon Nova Act with her solutions architect team. Tara has over 10 years of experience in engineering roles bringing products from vision to launch including Nova multimodal capabilities at Amazon and the Windows Subsystem for Linux and Visual Studio Code at Microsoft. Outside of work you can find her traveling, dancing, and trying new restaurants.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Nova Act SDK Agentic AI 浏览器自动化 AWS集成 企业级AI
相关文章