AWS Machine Learning Blog 07月29日 03:19
Build a drug discovery research assistant using Strands Agents and Amazon Bedrock
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

该文介绍了如何利用Strands Agents和Amazon Bedrock构建一个强大的AI研究助手,以加速药物发现过程。该助手能够连接科学数据库,如arXiv、PubMed和ChEMBL,同时搜索信息,综合研究结果并生成详细报告。通过多智能体协作模式,该系统能够处理复杂的药物研发任务,从靶点识别到治疗领域分析,显著提升生命科学研究的效率。文章还提供了详细的部署指南和示例用例。

💡 **AI助手加速药物发现**:生命科学领域的企业正利用AI和生成式AI工具,如Amazon Bedrock,来提高科学发现的速度。Strands Agents SDK提供了一种模型驱动的方法来开发和运行AI智能体,支持多种模型提供商和灵活的部署方式,为解决复杂的药物研发问题提供了新的途径。

🧰 **多源数据整合与分析**:该解决方案利用Strands Agents连接Amazon Bedrock的强大基础模型与arXiv、PubMed、ChEMBL等生命科学数据源,并通过MCP(Model Context Protocol)标准实现数据查询。它能够同时搜索多个数据库,综合信息,并生成关于药物靶点、疾病机制和治疗领域的综合报告。

🧩 **多智能体协作模式**:文章强调了小型、专注的AI智能体协同工作比单一大型智能体更有效。该解决方案采用了一个由协调智能体(orchestrator agent)管理,并能调用专门的子智能体(如规划智能体、合成智能体)和工具(如MCP客户端)的团队模式,以处理用户查询并执行信息检索、规划、综合和报告生成等任务。

🚀 **用户友好与灵活部署**:该研究助手可以通过本地开发环境构建,并提供了详细的步骤指导。对于生产环境,支持部署在AWS Lambda、AWS Fargate、Amazon EKS或Amazon EC2等服务上。用户可以通过简单的配置和API密钥接入,即可开始利用该助手进行研究,并可根据需求扩展新的科学工具。

Drug discovery is a complex, time-intensive process that requires researchers to navigate vast amounts of scientific literature, clinical trial data, and molecular databases. Life science customers like Genentech and AstraZeneca are using AI agents and other generative AI tools to increase the speed of scientific discovery. Builders at these organizations are already using the fully managed features of Amazon Bedrock to quickly deploy domain-specific workflows for a variety of use cases, from early drug target identification to healthcare provider engagement.

However, more complex use cases might benefit from using the open source Strands Agents SDK. Strands Agents takes a model-driven approach to develop and run AI agents. It works with most model providers, including custom and internal large language model (LLM) gateways, and agents can be deployed where you would host a Python application.

In this post, we demonstrate how to create a powerful research assistant for drug discovery using Strands Agents and Amazon Bedrock. This AI assistant can search multiple scientific databases simultaneously using the Model Context Protocol (MCP), synthesize its findings, and generate comprehensive reports on drug targets, disease mechanisms, and therapeutic areas. This assistant is available as an example in the open-source healthcare and life sciences agent toolkit for you to use and adapt.

Solution overview

This solution uses Strands Agents to connect high-performing foundation models (FMs) with common life science data sources like arXiv, PubMed, and ChEMBL. It demonstrates how to quickly create MCP servers to query data and view the results in a conversational interface.

Small, focused AI agents that work together can often produce better results than a single, monolithic agent. This solution uses a team of sub-agents, each with their own FM, instructions, and tools. The following flowchart shows how the orchestrator agent (shown in orange) handles user queries and routes them to sub-agents for either information retrieval (green) or planning, synthesis, and report generation (purple).

This post focuses on building with Strands Agents in your local development environment. Refer to the Strands Agents documentation to deploy production agents on AWS Lambda, AWS Fargate, Amazon Elastic Kubernetes Service (Amazon EKS), or Amazon Elastic Compute Cloud (Amazon EC2).

In the following sections, we show how to create the research assistant in Strands Agents by defining an FM, MCP tools, and sub-agents.

Prerequisites

This solution requires Python 3.10+, strands-agents, and several additional Python packages. We strongly recommend using a virtual environment like venv or uv to manage these dependencies.

Complete the following steps to deploy the solution to your local environment:

    Clone the code repository from GitHub. Install the required Python dependencies with pip install -r requirements.txt. Configure your AWS credentials by setting them as environment variables, adding them to a credentials file, or following another supported process. Save your Tavily API key to a .env file in the following format: TAVILY_API_KEY="YOUR_API_KEY".

You also need access to the following Amazon Bedrock FMs in your AWS account:

Define the foundation model

We start by defining a connection to an FM in Amazon Bedrock using the Strands Agents BedrockModel class. We use Anthropic’s Claude 3.7 Sonnet as the default model. See the following code:

from strands import Agent, toolfrom strands.models import BedrockModelfrom strands.agent.conversation_manager import SlidingWindowConversationManagerfrom strands.tools.mcp import MCPClient# Model configuration with Strands using Amazon Bedrock's foundation modelsdef get_model():    model = BedrockModel(        boto_client_config=Config(            read_timeout=900,            connect_timeout=900,            retries=dict(max_attempts=3, mode="adaptive"),        ),        model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",        max_tokens=64000,        temperature=0.1,        top_p=0.9,        additional_request_fields={            "thinking": {                "type": "disabled"  # Can be enabled for reasoning mode            }        }    )    return model

Define MCP tools

MCP provides a standard for how AI applications interact with their external environments. Thousands of MCP servers already exist, including those for life science tools and datasets. This solution provides example MCP servers for:

Strands Agents streamlines the definition of MCP clients for our agent. In this example, you connect to each tool using standard I/O. However, Strands Agents also supports remote MCP servers with Streamable-HTTP Events transport. See the following code:

# MCP Clients for various scientific databasestavily_mcp_client = MCPClient(lambda: stdio_client(    StdioServerParameters(command="python", args=["application/mcp_server_tavily.py"])))arxiv_mcp_client = MCPClient(lambda: stdio_client(    StdioServerParameters(command="python", args=["application/mcp_server_arxiv.py"])))pubmed_mcp_client = MCPClient(lambda: stdio_client(    StdioServerParameters(command="python", args=["application/mcp_server_pubmed.py"])))chembl_mcp_client = MCPClient(lambda: stdio_client(    StdioServerParameters(command="python", args=["application/mcp_server_chembl.py"])))clinicaltrials_mcp_client = MCPClient(lambda: stdio_client(    StdioServerParameters(command="python", args=["application/mcp_server_clinicaltrial.py"])))

Define specialized sub-agents

The planning agent looks at user questions and creates a plan for which sub-agents and tools to use:

@tooldef planning_agent(query: str) -> str:    """    A specialized planning agent that analyzes the research query and determines    which tools and databases should be used for the investigation.    """    planning_system = """    You are a specialized planning agent for drug discovery research. Your role is to:        1. Analyze research questions to identify target proteins, compounds, or biological mechanisms    2. Determine which databases would be most relevant (Arxiv, PubMed, ChEMBL, ClinicalTrials.gov)    3. Generate specific search queries for each relevant database    4. Create a structured research plan    """    model = get_model()    planner = Agent(        model=model,        system_prompt=planning_system,    )    response = planner(planning_prompt)    return str(response)

Similarly, the synthesis agent integrates findings from multiple sources into a single, comprehensive report:

@tooldef synthesis_agent(research_results: str) -> str:    """    Specialized agent for synthesizing research findings into a comprehensive report.    """    system_prompt = """    You are a specialized synthesis agent for drug discovery research. Your role is to:        1. Integrate findings from multiple research databases    2. Create a comprehensive, coherent scientific report    3. Highlight key insights, connections, and opportunities    4. Organize information in a structured format:       - Executive Summary (300 words)       - Target Overview       - Research Landscape       - Drug Development Status       - References    """    model = get_model()    synthesis = Agent(        model=model,        system_prompt=system_prompt,    )    response = synthesis(synthesis_prompt)    return str(response)

Define the orchestration agent

We also define an orchestration agent to coordinate the entire research workflow. This agent uses the SlidingWindowConversationManager class from Strands Agents to store the last 10 messages in the conversation. See the following code:

def create_orchestrator_agent(    history_mode,    tavily_client=None,    arxiv_client=None,    pubmed_client=None,    chembl_client=None,    clinicaltrials_client=None,):    system = """    You are an orchestrator agent for drug discovery research. Your role is to coordinate a multi-agent workflow:        1. COORDINATION PHASE:       - For simple queries: Answer directly WITHOUT using specialized tools       - For complex research requests: Initiate the multi-agent research workflow        2. PLANNING PHASE:       - Use the planning_agent to determine which databases to search and with what queries        3. EXECUTION PHASE:       - Route specialized search tasks to the appropriate research agents        4. SYNTHESIS PHASE:       - Use the synthesis_agent to integrate findings into a comprehensive report       - Generate a PDF report when appropriate    """    # Aggregate all tools from specialized agents and MCP clients    tools = [planning_agent, synthesis_agent, generate_pdf_report, file_write]    # Dynamically load tools from each MCP client    if tavily_client:        tools.extend(tavily_client.list_tools_sync())    # ... (similar for other clients)    conversation_manager = SlidingWindowConversationManager(        window_size=10,  # Maintains context for the last 10 exchanges    )    orchestrator = Agent(        model=model,        system_prompt=system,        tools=tools,        conversation_manager=conversation_manager    )    return orchestrator

Example use case: Explore recent breast cancer research

To test out the new assistant, launch the chat interface by running streamlit run application/app.py and opening the local URL (typically http://localhost:8501) in your web browser. The following screenshot shows a typical conversation with the research agent. In this example, we ask the assistant, “Please generate a report for HER2 including recent news, recent research, related compounds, and ongoing clinical trials.” The assistant first develops a comprehensive research plan using the various tools at its disposal. It decides to start with a web search for recent news about HER2, as well as scientific articles on PubMed and arXiv. It also looks at HER2-related compounds in ChEMBL and ongoing clinical trials. It synthesizes these results into a single report and generates an output file of its findings, including citations.

The following is an excerpt of a generated report:

Comprehensive Scientific Report: HER2 in Breast Cancer Research and Treatment1. Executive SummaryHuman epidermal growth factor receptor 2 (HER2) continues to be a critical target in breast cancer research and treatment development. This report synthesizes recent findings across the HER2 landscape highlighting significant advances in understanding HER2 biology and therapeutic approaches. The emergence of antibody-drug conjugates (ADCs) represents a paradigm shift in HER2-targeted therapy, with trastuzumab deruxtecan (T-DXd, Enhertu) demonstrating remarkable efficacy in both early and advanced disease settings. The DESTINY-Breast11 trial has shown clinically meaningful improvements in pathologic complete response rates when T-DXd is followed by standard therapy in high-risk, early-stage HER2+ breast cancer, potentially establishing a new treatment paradigm.

Notably, you don’t have to define a step-by-step process to accomplish this task. By providing the assistant with a well-documented list of tools, it can decide which to use and in what order.

Clean up

If you followed this example on your local computer, you will not create new resources in your AWS account that you need to clean up. If you deployed the research assistant using one of those services, refer to the relevant service documentation for cleanup instructions.

Conclusion

In this post, we showed how Strands Agents streamlines the creation of powerful, domain-specific AI assistants. We encourage you to try this solution with your own research questions and extend it with new scientific tools. The combination of Strands Agents’s orchestration capabilities, streaming responses, and flexible configuration with the powerful language models of Amazon Bedrock creates a new paradigm for AI-assisted research. As the volume of scientific information continues to grow exponentially, frameworks like Strands Agents will become essential tools for drug discovery.

To learn more about building intelligent agents with Strands Agents, refer to Introducing Strands Agents, an Open Source AI Agents SDK, Strands Agents SDK, and the GitHub repository. You can also find more sample agents for healthcare and life sciences built on Amazon Bedrock.

For more information about implementing AI-powered solutions for drug discovery on AWS, visit us at AWS for Life Sciences.


About the authors

Hasun Yu is an AI/ML Specialist Solutions Architect with extensive expertise in designing, developing, and deploying AI/ML solutions for healthcare and life sciences. He supports the adoption of advanced AWS AI/ML services, including generative and agentic AI.

Brian Loyal is a Principal AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 20 years’ experience in biotechnology and machine learning and is passionate about using AI to improve human health and well-being.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Strands Agents Amazon Bedrock AI药物发现 生命科学 多智能体
相关文章