V2EX 07月21日 12:38
[程序员] 求助, Milvus 数据库导入数据会导致数据库崩溃,是我的配置问题吗?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文档详述了Milvus v2.6.0-rc1版本在Docker Compose环境下的配置,包括etcd和MinIO服务的详细参数设置,以及Milvus standalone节点的启动命令和资源限制。机器配置为4T内存和256CPU,并提供了milvus.yaml的链接。日志部分揭示了Milvus集群在运行过程中遇到的关键问题,如StreamingNode、QueryNode、MixCoord、Proxy和DataNode等节点与etcd的连接中断,导致服务异常退出。主要表现为etcd lease not found错误以及StreamingNode因通道被fenced而无法同步时间滴答。这些问题直接影响了Milvus集群的稳定运行和数据处理能力。

🚀 **Milvus集群组件配置概览**:本文档提供了Milvus v2.6.0-rc1版本在Docker Compose下的部署配置,详细列出了etcd和MinIO两个核心依赖服务的容器配置,包括镜像版本、端口、存储卷、健康检查、资源限制(内存、CPU)及日志设置。etcd被配置为使用v3.5.18版本,并启用了自动压缩和设置了较大的配额。MinIO则采用特定发布版本的镜像,并配置了访问密钥和数据存储路径。Milvus standalone服务也进行了详细配置,包括容器名称、镜像、启动命令、环境变量(指向etcd和MinIO)、挂载卷(数据和配置文件)、健康检查、端口映射以及高达1024g的内存和32.0的CPU资源分配,并依赖于etcd和MinIO的启动。

⚠️ **etcd与MinIO的资源与健康设置**:etcd服务被分配了16g内存和4.0 CPU,并通过`etcdctl endpoint health`命令进行健康检查,设置了30秒的检查间隔和20秒的超时。MinIO服务同样分配了16g内存和4.0 CPU,其健康检查通过`curl -f http://localhost:9000/minio/health/live`完成,检查间隔和超时设置与etcd一致。这些详细的资源和健康检查配置旨在保证Milvus集群核心依赖服务的稳定运行,为大规模数据处理奠定基础。

🚨 **Milvus节点与etcd通信中断分析**:日志显示,Milvus集群在运行时频繁出现QueryNode、MixCoord、Proxy和StreamingNode等关键组件与etcd的连接中断问题,表现为“etcdserver: requested lease not found”错误。这导致这些节点无法维持与etcd的会话,并最终被强制退出。例如,Query Node因与etcd断开连接而退出,MixCoord也同样遭遇此问题。StreamingNode则出现“STREAMING_CODE_CHANNEL_FENCED”错误,表明其负责的时间滴答同步通道被隔离,进一步加剧了服务的不稳定性。

📈 **StreamingNode通道fenced及数据同步问题**:StreamingNode日志中多次出现“send time tick sync message failed”以及“append time tick msg to wal failed, timestamp: ..., previous message counter: 8: code: STREAMING_CODE_CHANNEL_FENCED”的警告。这表明在数据流处理的关键环节,时间滴答消息的写入失败,且通道被标记为fenced状态。此外,还出现了“create handler failed”和“report assignment error”等信息,涉及`by-dev-rootcoord-dml_10`等通道,指出通道不存在或分配错误,这些都直接阻碍了数据的正常同步和处理流程。

数据量:6000w

Milvus docker compose:

services:  etcd:    container_name: milvus-etcd    image: quay.io/coreos/etcd:v3.5.18    environment:      - ETCD_AUTO_COMPACTION_MODE=revision      - ETCD_AUTO_COMPACTION_RETENTION=1000      - ETCD_QUOTA_BACKEND_BYTES=8589934592      - ETCD_SNAPSHOT_COUNT=50000      - ETCD_MAX_REQUEST_BYTES=33554432    volumes:      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd    command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd    healthcheck:      test: ["CMD", "etcdctl", "endpoint", "health"]      interval: 30s      timeout: 20s      retries: 3    ulimits:      nofile:        soft: 655360        hard: 655360    mem_limit: 16g    cpus: 4.0    logging:      driver: "json-file"      options:        max-size: "100m"        max-file: "3"  minio:    container_name: milvus-minio    image: minio/minio:RELEASE.2024-05-28T17-19-04Z    environment:      MINIO_ACCESS_KEY: xxxxx      MINIO_SECRET_KEY: xxxxx    ports:      - "9001:9001"      - "9000:9000"    volumes:      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data    command: minio server /minio_data --console-address ":9001"    healthcheck:      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]      interval: 30s      timeout: 20s      retries: 3    ulimits:      nofile:        soft: 655360        hard: 655360    mem_limit: 16g    cpus: 4.0    logging:      driver: "json-file"      options:        max-size: "100m"        max-file: "3"  standalone:    container_name: milvus    image: milvusdb/milvus:v2.6.0-rc1    command: ["milvus", "run", "standalone"]    security_opt:    - seccomp:unconfined    environment:      ETCD_ENDPOINTS: etcd:2379      MINIO_ADDRESS: minio:9000      MQ_TYPE: woodpecker    volumes:      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/_milvus:/var/lib/milvus      - ./milvus.yaml:/milvus/configs/milvus.yaml    healthcheck:      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]      interval: 30s      start_period: 90s      timeout: 20s      retries: 3    ports:      - "xxxx:19530"      - "xxxx:9091"    depends_on:      - "etcd"      - "minio"    ulimits:      nofile:        soft: 655360        hard: 655360    mem_limit: 1024g    cpus: 32.0    logging:      driver: "json-file"      options:        max-size: "100m"        max-file: "3"networks:  default:    name: milvus

机器配置:4T 内存,256 CPU

milvus.yaml: https://raw.githubusercontent.com/milvus-io/milvus/v2.6.0-rc1/configs/milvus.yaml

部分日志:

milvus        | [2025/07/18 15:41:28.119 +00:00] [WARN] [timetick/timetick_sync_operator.go:85] ["send time tick sync message failed"] [module=streamingnode] [component=timetick-sync] [pchannel=by-dev-rootcoord-dml_8:rw@3] [error="append time tick msg to wal failed, timestamp: 459499972358307846, previous message counter: 8: code: STREAMING_CODE_CHANNEL_FENCED, cause: by-dev-rootcoord-dml_8:rw@3 fenced"]milvus        | [2025/07/18 15:41:28.119 +00:00] [WARN] [timetick/timetick_sync_operator.go:85] ["send time tick sync message failed"] [module=streamingnode] [component=timetick-sync] [pchannel=by-dev-rootcoord-dml_14:rw@3] [error="append time tick msg to wal failed, timestamp: 459499972358307848, previous message counter: 8: code: STREAMING_CODE_CHANNEL_FENCED, cause: by-dev-rootcoord-dml_14:rw@3 fenced"]milvus        | [2025/07/18 15:41:28.119 +00:00] [WARN] [timetick/timetick_sync_operator.go:85] ["send time tick sync message failed"] [module=streamingnode] [component=timetick-sync] [pchannel=by-dev-rootcoord-dml_7:rw@3] [error="append time tick msg to wal failed, timestamp: 459499972358307847, previous message counter: 8: code: STREAMING_CODE_CHANNEL_FENCED, cause: by-dev-rootcoord-dml_7:rw@3 fenced"]milvus        | [2025/07/18 15:41:28.120 +00:00] [WARN] [sessionutil/session_util.go:593] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=7587888197442225626] [error="etcdserver: requested lease not found"]milvus        | [2025/07/18 15:41:28.121 +00:00] [ERROR] [querynodev2/server.go:188] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=2] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/workspace/source/internal/querynodev2/server.go:188"]milvus        | [2025/07/18 15:41:28.121 +00:00] [WARN] [sessionutil/session_util.go:593] ["fail to retry keepAliveOnce"] [serverName=mixcoord] [LeaseID=7587888197442225598] [error="etcdserver: requested lease not found"]milvus        | [2025/07/18 15:41:28.122 +00:00] [ERROR] [coordinator/mix_coord.go:107] ["MixCoord disconnected from etcd, process will exit"] [serverID=2] [stack="github.com/milvus-io/milvus/internal/coordinator.(*mixCoordImpl).Register.(*mixCoordImpl).Register.func1.func3\n\t/workspace/source/internal/coordinator/mix_coord.go:107"]milvus        | [2025/07/18 15:41:28.122 +00:00] [WARN] [sessionutil/session_util.go:593] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=7587888197442225923] [error="etcdserver: requested lease not found"]milvus        | [2025/07/18 15:41:28.122 +00:00] [ERROR] [proxy/proxy.go:181] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=2] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/workspace/source/internal/proxy/proxy.go:181"]milvus        | [2025/07/18 15:41:28.122 +00:00] [WARN] [handler/handler_client_impl.go:178] ["create handler failed"] [pchannel=by-dev-rootcoord-dml_10] [handler=producer] [assignment=by-dev-rootcoord-dml_10:rw@3>2@172.23.0.4:22222] [error="/milvus.proto.streaming.StreamingNodeHandlerService/Produce; streaming error: code = STREAMING_CODE_CHANNEL_NOT_EXIST, cause = by-dev-rootcoord-dml_10 not exist; rpc error: code = FailedPrecondition, desc = "]milvus        | [2025/07/18 15:41:28.123 +00:00] [INFO] [handler/handler_client_impl.go:183] ["report assignment error"] [pchannel=by-dev-rootcoord-dml_10] [handler=producer] [assignmentError="/milvus.proto.streaming.StreamingNodeHandlerService/Produce; streaming error: code = STREAMING_CODE_CHANNEL_NOT_EXIST, cause = by-dev-rootcoord-dml_10 not exist; rpc error: code = FailedPrecondition, desc = "] []milvus        | [2025/07/18 15:41:28.120 +00:00] [ERROR] [streamingnode/service.go:389] ["StreamingNode disconnected from etcd, process will exit"] ["Server Id"=2] [stack="github.com/milvus-io/milvus/internal/distributed/streamingnode.(*Server).registerSessionToETCD.func1\n\t/workspace/source/internal/distributed/streamingnode/service.go:389"]milvus        | [2025/07/18 15:41:28.120 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=2] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/workspace/source/internal/datanode/data_node.go:200"]

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Milvus Docker Compose etcd MinIO 集群部署 通信问题
相关文章