AWS Machine Learning Blog 19小时前
Building AIOps with Amazon Q Developer CLI and MCP Server
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何利用Amazon Q Developer CLI和Model Context Protocol (MCP)服务器构建强大的AIOps解决方案。通过自然语言交互,可以显著减少IT团队在管理复杂基础设施和应用时面临的繁琐手动操作,如问题识别、故障排除和重复性维护任务。该方案能够自动化运维工作流,检测异常并最小化人工干预,从而优化运营效率和安全性。文章详细阐述了配置MCP服务器、部署AWS资源以及通过两个具体用例(识别和修复EC2实例高CPU利用率、移除S3存储桶的公共访问权限)来展示Amazon Q Developer CLI在AIOps领域的应用。

💡 **AIOps自动化运维的挑战与机遇**:IT团队面临基础设施和应用日益复杂的管理挑战,大量时间被用于手动识别、故障排除和执行重复性任务,这占用了宝贵的创新资源。AIOps通过AI自动化运维工作流,能够检测异常并最小化人工干预,从而提升效率并确保安全。

🚀 **Amazon Q Developer CLI与MCP服务器赋能AIOps**:Amazon Q Developer CLI结合Model Context Protocol (MCP)服务器,提供了一种低代码/无代码的方式来构建AIOps解决方案。MCP服务器作为AI模型的通用连接器,能与外部系统、实时数据和各种工具无缝集成,使Amazon Q能获取实时信息,提供更具上下文感知能力的辅助,实现复杂的运维自动化。

🛠️ **配置与部署环境**:文章详细指导了如何通过JSON配置文件(如`.amazonq/mcp.json`)在Amazon Q Developer CLI中配置MCP服务器,特别是Amazon Bedrock知识库检索MCP服务器。同时,提供了AWS CloudFormation模板,用于部署测试AIOps所需的AWS资源,包括EC2实例和S3存储桶。

✅ **实际用例演示**:通过两个具体的用例,文章展示了Amazon Q Developer CLI的实际应用价值。用例一通过模拟EC2实例高CPU利用率,展示了Amazon Q如何识别并自动修复问题;用例二则模拟了S3存储桶的公共访问权限设置错误,并演示了Amazon Q如何检测并纠正此安全风险。

IT teams face mounting challenges as they manage increasingly complex infrastructure and applications, often spending countless hours manually identifying operational issues, troubleshooting problems, and performing repetitive maintenance tasks. This operational burden diverts valuable technical resources from innovation and strategic initiatives. Artificial intelligence for IT operations (AIOps) presents a transformative solution, using AI to automate operational workflows, detect anomalies, and resolve incidents with minimal human intervention. Organizations can optimize their operational efficiency while maintaining security as they manage their infrastructure and applications.

You can use Amazon Q Developer CLI and Model Context Protocol (MCP) servers to build powerful AIOps solutions that can reduce manual effort through natural language interactions. Amazon Q Developer can help developers and IT professionals with many of their tasks—from coding, testing, and deploying, to troubleshooting, performing security scanning and fixes, modernizing applications, optimizing AWS resources, and creating data engineering pipelines. The MCP extends these capabilities by enabling Amazon Q to connect with custom tools and services through a standardized interface, allowing for more sophisticated operational automations.

In this post, we discuss how to implement a low-code no-code AIOps solution that helps organizations monitor, identify, and troubleshoot operational events while maintaining their security posture. We show how these technologies work together to automate repetitive tasks, streamline incident response, and enhance operational efficiency across your organization.

This is the third post in a series on AIOps using generative AI services on AWS. Refer to the following two posts for building AIOps using Amazon Bedrock and Amazon Q Business:

Solution overview

MCP servers act like a universal connector for AI models, enabling them to interact with external systems, fetch live data, and integrate with various tools seamlessly. This helps Amazon Q provide more contextually relevant assistance by accessing the information it needs in real time. The following architecture diagram illustrates how you can use a single configuration file, mcp.json, to configure MCP servers in Amazon Q Developer CLI to connect to external systems.

The workflow consists of the following steps:

    The user configures an MCP client in Amazon Q Developer CLI using the mcp.json file. The user logs in to Amazon Q Developer CLI and asks operational queries in natural language. Depending on your query, Amazon Q decides which MCP servers that you configured or existing tools to invoke to perform the task. The MCP server interacts with the respective external system to get the live data that is used by Amazon Q to perform the required task.

In this post, we show how to use Amazon Q Developer CLI to address the following operational issues:

Prerequisites

Complete the following prerequisites before you start setting up the demo:

Configure MCP in Amazon Q Developer CLI

MCP configuration in Amazon Q Developer CLI is managed through JSON files. You will configure the Amazon Bedrock Knowledge Base Retrieval MCP Server. At the time of writing, only the stdio transport is supported in Amazon Q Developer CLI.

Amazon Q Developer CLI supports two levels of MCP configuration:

For this post, we use the workspace configuration, but you have option to use either of them.

    Create a new workspace folder, and inside that folder, create the file .amazonq/mcp.json with the following content:
{  "mcpServers": {    "awslabs.bedrock-kb-retrieval-mcp-server": {      "command": "uvx",      "args": ["awslabs.bedrock-kb-retrieval-mcp-server@latest"],      "env": {        "AWS_PROFILE": "your-profile-name ",        "AWS_REGION": "your-region",        "FASTMCP_LOG_LEVEL": "ERROR",        "KB_INCLUSION_TAG_KEY": "name=aiops-knowledge-base",        "BEDROCK_KB_RERANKING_ENABLED": "false"      },      "disabled": false,      "autoApprove": []    }    }}

See the AWS MCP Servers GitHub repository for an updated list of available MCP servers.

    Open a terminal, navigate to the workspace folder that you created, and run the following command to log in to Amazon Q Developer CLI:
q login
    Follow the instructions to log in to Amazon Q Developer on the command line. Initiate the chat session by running q and then run /tools to validate that the Amazon Bedrock Knowledge Base Retrieval MCP server is configured.

Tool permissions have two possible states:

By default, this tool will not be trusted.

5. Run /tools trust awslabsbedrock_kb_retrieval_mcp_server___QueryKnowledgeBases to trust the MCP server.

6. Run the /tools command again to validate it.

Deploy AWS resources

Deploy the following AWS CloudFormation template to deploy the AWS resources that you will use to test AIOps. You can deploy this template in either the us-east-1 or us-west-2 AWS Region. You can deploy it in other Regions by updating the applicable AMI IDs in the template. This template will deploy two EC2 instances and three S3 buckets.

This CloudFormation template is for demo purposes only and not meant for production usage.

AWSTemplateFormatVersion: '2010-09-09'Description: >-  This template creates the necessary AWS resources which will be used to test AIOps using   Amazon Q Developer CLI with MCP server integration.Metadata:  AWS::CloudFormation::Interface:    ParameterGroups:      - Label:          default: Network        Parameters:          - SecurityGroupIngressCidrIp      - Label:          default: General        Parameters:          - Prefix    ParameterLabels:      SecurityGroupIngressCidrIp:        default: Security group ingress CIDR IPParameters:  Prefix:    Type: String    Description: Unique name prefix for resources that are created by the stack.    ConstraintDescription: >-      must not start with a dash, and must only contain lowercase a-z, digits,      and a dash.    AllowedPattern: ^[a-z0-9][a-z0-9-]+$    MinLength: 1    MaxLength: 30    Default: aiops-qdevcli  SecurityGroupIngressCidrIp:    Type: String    Description: >-      IPv4 address in CIDR format for allowed incoming traffic to the EC2 instance. Defaults to allowing all IPs.    ConstraintDescription: >-      must be in the form x.x.x.x/s, where x is 0-255, and s is 0-32.    AllowedPattern: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\/([0-9]|[1-2][0-9]|3[0-2]))$    Default: 0.0.0.0/0Resources:  # AIOps Amazon S3 bucket1  AIOpsQDeveloperCliS3Bucket1:    Type: AWS::S3::Bucket    Properties:      AccessControl: Private      BucketName:        Fn::Sub: ${Prefix}-bucket1-${AWS::AccountId}      PublicAccessBlockConfiguration:        BlockPublicAcls: true        BlockPublicPolicy: true        IgnorePublicAcls: true        RestrictPublicBuckets: true  # AIOps Amazon S3 bucket2  AIOpsQDeveloperCliS3Bucket2:    Type: AWS::S3::Bucket    Properties:      AccessControl: Private      BucketName:        Fn::Sub: ${Prefix}-bucket2-${AWS::AccountId}      PublicAccessBlockConfiguration:        BlockPublicAcls: true        BlockPublicPolicy: true        IgnorePublicAcls: true        RestrictPublicBuckets: true  # AIOps Amazon S3 bucket3  AIOpsQDeveloperCliS3Bucket3:    Type: AWS::S3::Bucket    Properties:      AccessControl: Private      BucketName:        Fn::Sub: ${Prefix}-bucket3-${AWS::AccountId}      PublicAccessBlockConfiguration:        BlockPublicAcls: true        BlockPublicPolicy: true        IgnorePublicAcls: true        RestrictPublicBuckets: true  # AIOps Knowledgebase S3 bucket  AIOpsQDeveloperKBS3Bucket:    Type: AWS::S3::Bucket    Properties:      AccessControl: Private      BucketName:        Fn::Sub: ${Prefix}-kb-${AWS::AccountId}      PublicAccessBlockConfiguration:        BlockPublicAcls: true        BlockPublicPolicy: true        IgnorePublicAcls: true        RestrictPublicBuckets: true  # AIOps VPC resources  AIOpsQDeveloperCliVPC:    Type: AWS::EC2::VPC    Properties:      CidrBlock: 10.0.0.0/16      Tags:        - Key: Name          Value: AIOpsQDeveloperCliVPC  AIOpsQDeveloperCliSubnet1:    Type: AWS::EC2::Subnet    Properties:      CidrBlock: 10.0.1.0/24      VpcId:        Ref: AIOpsQDeveloperCliVPC      AvailabilityZone: !Select         - 0        - !GetAZs           Ref: 'AWS::Region'      Tags:        - Key: Name          Value: AIOpsQDeveloperCliSubnet1  AIOpsQDeveloperCliSubnet2:    Type: AWS::EC2::Subnet    Properties:      CidrBlock: 10.0.3.0/24      VpcId:        Ref: AIOpsQDeveloperCliVPC      AvailabilityZone: !Select         - 1        - !GetAZs           Ref: 'AWS::Region'      Tags:        - Key: Name          Value: AIOpsQDeveloperCliSubnet2  AIOpsQDeveloperIGW:    Type: AWS::EC2::InternetGateway    Properties:      Tags:        - Key: Name          Value: AIOpsQDeveloperIGW  AIOpsQDeveloperCliVPCGatewayAttachment:    Type: AWS::EC2::VPCGatewayAttachment    Properties:      InternetGatewayId:        Ref: AIOpsQDeveloperIGW      VpcId:        Ref: AIOpsQDeveloperCliVPC  AIOpsQDeveloperCliRT:    Type: AWS::EC2::RouteTable    Properties:      VpcId:        Ref: AIOpsQDeveloperCliVPC      Tags:        - Key: Name          Value: AIOpsQDeveloperCliRT  AIOpsRoute:    Type: AWS::EC2::Route    DependsOn:      - AIOpsQDeveloperCliVPCGatewayAttachment    Properties:      DestinationCidrBlock: 0.0.0.0/0      GatewayId:        Ref: AIOpsQDeveloperIGW      RouteTableId:        Ref: AIOpsQDeveloperCliRT  AIOpsQDeveloperCliSubnetRouteTableAssociation1:    Type: AWS::EC2::SubnetRouteTableAssociation    Properties:      RouteTableId:        Ref: AIOpsQDeveloperCliRT      SubnetId:        Ref: AIOpsQDeveloperCliSubnet1  AIOpsQDeveloperCliSubnetRouteTableAssociation2:    Type: AWS::EC2::SubnetRouteTableAssociation    Properties:      RouteTableId:        Ref: AIOpsQDeveloperCliRT      SubnetId:        Ref: AIOpsQDeveloperCliSubnet2  AIOpsQDeveloperCliSG1:    Type: AWS::EC2::SecurityGroup    Properties:      GroupDescription: >-        Allows incoming traffic on port 5080 and denies all outgoing traffic.      SecurityGroupEgress:        - Description: Denies all outgoing traffic.          IpProtocol: -1          CidrIp: 0.0.0.0/32      SecurityGroupIngress:        - Description: Allows incoming TCP traffic on port 22.          IpProtocol: tcp          FromPort: 22          ToPort: 22          CidrIp:            Ref: SecurityGroupIngressCidrIp              VpcId:        Ref: AIOpsQDeveloperCliVPC      Tags:        - Key: Name          Value: AIOpsQDeveloperCliSG1  AIOpsQDeveloperCliSG2:    Type: AWS::EC2::SecurityGroup    Properties:      GroupDescription: >-        Allows incoming traffic on port 5080 and denies all outgoing traffic.      SecurityGroupEgress:        - Description: Denies all outgoing traffic.          IpProtocol: -1          CidrIp: 0.0.0.0/32      SecurityGroupIngress:        - Description: Allows incoming TCP traffic on port 5080.          IpProtocol: tcp          FromPort: 5080          ToPort: 5080          CidrIp:            Ref: SecurityGroupIngressCidrIp        - Description: Allows incoming TCP traffic on port 22.          IpProtocol: tcp          FromPort: 22          ToPort: 22          CidrIp:            Ref: SecurityGroupIngressCidrIp              VpcId:        Ref: AIOpsQDeveloperCliVPC      Tags:        - Key: Name          Value: AIOpsQDeveloperCliSG2  EC2KeyPair:    Type: AWS::EC2::KeyPair    Properties:      KeyName:         Fn::Sub: ${Prefix}-keypair-${AWS::AccountId}  # EC2 instance to demo high CPU Utilization AIOps    EC2InstanceHighCPUUtilDemo:    Type: AWS::EC2::Instance    Properties:      InstanceType: t2.micro      KeyName: !Ref EC2KeyPair            ImageId: !FindInMap [RegionMap, !Ref 'AWS::Region', AL2023]      NetworkInterfaces:        - AssociatePublicIpAddress: true          DeviceIndex: 0          SubnetId: !Ref AIOpsQDeveloperCliSubnet1          GroupSet:             - !Ref AIOpsQDeveloperCliSG1      Tags:        - Key: Name          Value:            Fn::Sub: ${Prefix}-high-cpu-util  # EC2 instance to demo unwanted open port detection AIOps    EC2InstanceOpenPortDemo:    Type: AWS::EC2::Instance    Properties:      InstanceType: t2.micro      KeyName: !Ref EC2KeyPair            ImageId: !FindInMap [RegionMap, !Ref 'AWS::Region', AL2023]      NetworkInterfaces:        - AssociatePublicIpAddress: true          DeviceIndex: 0          SubnetId: !Ref AIOpsQDeveloperCliSubnet1          GroupSet:             - !Ref AIOpsQDeveloperCliSG2      Tags:        - Key: Name          Value:            Fn::Sub: ${Prefix}-open-port-demo  CPUUtilizationAlarm:    Type: AWS::CloudWatch::Alarm    Properties:      AlarmName:         Fn::Sub: ${Prefix}-EC2-Instance-CPU-Utilization      AlarmDescription: Alarm when server CPU exceeds 70%      ComparisonOperator: GreaterThanThreshold      EvaluationPeriods: 1      MetricName: CPUUtilization      Namespace: AWS/EC2      Period: 60      Statistic: Average      Threshold: 70.0      ActionsEnabled: false      Dimensions:        - Name: InstanceId          Value: !Ref EC2InstanceHighCPUUtilDemo      Unit: PercentMappings:  RegionMap:    us-east-1:      AL2023: ami-085ad6ae776d8f09c    us-west-2:      AL2023: ami-0005ee01bca55ab66Outputs:  AIOpsQDeveloperCliS3Bucket1:    Description: S3 bucket created for testing AIOps    Value:      Ref: AIOpsQDeveloperCliS3Bucket1  AIOpsQDeveloperCliS3Bucket2:    Description: S3 bucket created for testing AIOps    Value:      Ref: AIOpsQDeveloperCliS3Bucket2  AIOpsQDeveloperCliS3Bucket3:    Description: S3 bucket created for testing AIOps    Value:      Ref: AIOpsQDeveloperCliS3Bucket3  AIOpsQDeveloperKBS3Bucket:    Description: S3 bucket created for testing AIOps    Value:      Ref: AIOpsQDeveloperKBS3Bucket  EC2InstanceHighCPUUtilDemo:    Description: EC2 instance for testing AIOps    Value:      Ref: EC2InstanceHighCPUUtilDemo  EC2InstanceOpenPortDemo:    Description: EC2 instance for testing AIOps    Value:      Ref: EC2InstanceOpenPortDemo

Validate that the template deployed two EC2 instances, which are in Running state.

Additionally, validate that the template created three S3 buckets with the names aiops-qdevcli-bucketX-<your-AWS-account-Id> and one bucket with the name aiops-qdevcli-<your-AWS-account-Id> in your selected Region.

Create an Amazon Bedrock knowledge base

Upload the sample high CPU utilization runbook to the aiops-qdevcli-<your-AWS-account-Id> bucket. Create a knowledge base pointing to the bucket, and note the knowledge base ID to use in the first example use case.

Use case 1: Identify and remediate high CPU utilization in an EC2 instance

In this use case, you introduce CPU stress in one of the EC2 instances and then use Amazon Q Developer CLI to identify and remediate it.

    On the Amazon EC2 console, log in to the aiops-qdevcli-high-cpu-util instance using EC2 Instance Connect. Run the following command to install stress-ng:
sudo dnf install stress-ng
    Run the following command to stress the EC2 instance for 1 hour:
stress-ng --cpu 1 --timeout 3600s

You must wait approximately 10 minutes for the Amazon CloudWatch alarm to get triggered.

    Return to the Amazon EC2 console and check that the aiops-qdevcli-high-cpu-util instance is currently in Alarm state. From the Amazon Q Developer CLI, use a natural language query to check for operation issues in your account. Use the knowledge base ID that you saved in the previous section.

Amazon Q Developer CLI autocorrects the errors that it encountered while running the commands.

Watch the following video for more details.

Due to the inherent nondeterministic nature of the FMs, the responses you receive from Amazon Q Developer CLI might not be exactly the same as those shown in the demo.

Use case 2: Identify and remove public access from an S3 bucket

In this use case, you will simulate an accidental security issue by unblocking public access for one of the buckets and then use Amazon Q Developer CLI to identify and remediate the issue.

    On the Amazon S3 console, open one of the aiops-qdevcli-xxxx buckets, and on the Permissions tab, choose Edit and change Block all public access to Off.

    Return to the Amazon Q Developer CLI and ask questions in natural language to identify and remediate the operational issue.

Watch the following video for more details.

Use case 3: Identify and block a specific unwanted open port for inbound connection to an EC2 instance

In this use case, you will use Amazon Q Developer CLI to identify the EC2 instance that has a specific port open and then close the port.

    On the Amazon EC2 console, note that the aiops-qdevcli-open-port-demo instance has port 5080 open for all inbound TCP connections. This is an unwanted security risk that you want to identify and remediate.

    Return to Amazon Q Developer CLI and use natural language queries to identify the EC2 instance with port 5080 open and fix the issue.

Watch the following video for details.

Clean up

Properly decommissioning provisioned AWS resources is an important best practice to optimize costs and enhance security posture after concluding proofs of concept and demonstrations. Complete the following steps to delete the resources created in your AWS account:

    On the Amazon Bedrock console, delete the Amazon Bedrock knowledge base. On the Amazon S3 console, empty the aiops-qdevcli-kb-xxx bucket. On the AWS CloudFormation console, delete the CloudFormation stack.

As an alternative, try the preceding steps using natural language queries in Amazon Q Developer CLI.

    Finally, delete the .amazonq/mcp.json file from your workspace folder to remove the MCP configuration for Amazon Q Developer CLI.

Conclusion

In this post, we showed how Amazon Q Developer CLI interprets natural language queries, automatically converts them into appropriate commands, and identifies the necessary tools for execution. The solution’s intelligent error-handling capabilities analyze logs and perform auto-corrections, minimizing manual intervention. By implementing Amazon Q Developer CLI, you can enhance your team’s operational efficiency, reduce human errors, and manage complex environments more effectively through a conversational interface.We encourage you to explore additional use cases and share your feedback with us. For more information on Amazon Q Developer CLI and AWS MCP servers, refer to the following resources:


About the authors

Biswanath Mukherjee is a Senior Solutions Architect at Amazon Web Services. He works with large strategic customers of AWS by providing them technical guidance to migrate and modernize their applications on AWS Cloud. With his extensive experience in cloud architecture and migration, he partners with customers to develop innovative solutions that leverage the scalability, reliability, and agility of AWS to meet their business needs. His expertise spans diverse industries and use cases, enabling customers to unlock the full potential of the AWS Cloud.

Upendra V is a Senior Solutions Architect at Amazon Web Services, specializing in Generative AI and cloud solutions. He helps enterprise customers design and deploy production-ready Generative AI workloads, implement Large Language Models (LLMs) and Agentic AI systems, and optimize cloud deployments. With expertise in cloud adoption and machine learning, he enables organizations to build and scale AI-driven applications efficiently.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AIOps Amazon Q Developer CLI AWS MCP服务器 自动化运维
相关文章