当第二个 pods bootstraps / 添加到 replicaset 时,helm chart "stable/mongodb-replicaset" 卡住了

helm chart "stable/mongodb-replicaset" is stuck when 2nd pods bootstraps / is added to replicaset

Helm 和 Kubernetes 版本:

Kubernetes(GKE): 客户端版本:v1.9.7 服务器版本:1.10.6-gke.2

头盔:2.10.0 服务器:v2.10.0

哪个图表:

stable/mongodb-replicaset

发生了什么:

总结:第一个 Pod 启动正确。第二个 Pod 卡在 "Init:2/3"

详情: 我想设置一个包含 3 个副本的 mongodb 副本集。我想使用 X509 的身份验证和 TLS 进行身份验证。 以下是用于 helm 安装的 values.yaml 文件的内容:

replicas: 3
port: 27018

replicaSetName: rs0

podDisruptionBudget: {}
  # maxUnavailable: 1
  # minAvailable: 2

auth:
  enabled: true
  adminUser: admin
  adminPassword: pass1234
  metricsUser: metrics
  metricsPassword: pass1234
  key: abcdefghijklmnopqrstuvwxyz1234567890
  #existingKeySecret:
  #existingAdminSecret:
  #exisitingMetricsSecret:

# Specs for the Docker image for the init container that establishes the replica set
installImage:
  repository: k8s.gcr.io/mongodb-install
  tag: 0.6

  pullPolicy: IfNotPresent

# Specs for the MongoDB image
image:
  repository: mongo
  #tag: 3.6
  tag: latest
  pullPolicy: IfNotPresent

# Additional environment variables to be set in the container
extraVars: {}
# - name: TCMALLOC_AGGRESSIVE_DECOMMIT
#   value: "true"

# Prometheus Metrics Exporter
metrics:
  enabled: false
  image:
    repository: ssalaues/mongodb-exporter
    tag: 0.6.1
    pullPolicy: IfNotPresent
  port: 9216
  path: "/metrics"
  socketTimeout: 3s
  syncTimeout: 1m
  prometheusServiceDiscovery: true
  resources: {}

# Annotations to be added to MongoDB pods
podAnnotations: {}

securityContext:
  runAsUser: 999
  fsGroup: 999
  runAsNonRoot: true

resources:
  limits:
  #   cpu: 100m
    memory: 512Mi
  requests:
#   cpu: 100m
    memory: 256Mi

## Node selector
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
nodeSelector: {}

affinity: {}

tolerations: []

extraLabels: {}

persistentVolume:
  enabled: true
  ## mongodb-replicaset data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClass: "standard"
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  annotations: {}

# Annotations to be added to the service
serviceAnnotations: {}

tls:
  # Enable or disable MongoDB TLS support
  enabled: true
  # Please generate your own TLS CA by generating it via:
  # $ openssl genrsa -out ca.key 2048
  # $ openssl req -x509 -new -nodes -key ca.key -days 10000 -out ca.crt -subj "/CN=mydomain.com"
  # After that you can base64 encode it and paste it here:
  # $ cat ca.key | base64 -w0
  cacert: base64 encoded ca certificate goes here
  cakey: base64 encoded ca key goes here

# Entries for the MongoDB config file
configmap:
  storage:
    dbPath: /data/db
  net:
    port: 27018
    ssl:
      mode: requireSSL
      CAFile: /data/configdb/tls.crt
      PEMKeyFile: /work-dir/mongo.pem
  replication:
    replSetName: rs0
  security:
    authorization: enabled
    clusterAuthMode: x509
    keyFile: /data/configdb/key.txt

# Readiness probe
readinessProbe:
  initialDelaySeconds: 5
  timeoutSeconds: 1
  failureThreshold: 3
  periodSeconds: 10
  successThreshold: 1

# Liveness probe
livenessProbe:
  initialDelaySeconds: 30
  timeoutSeconds: 5
  failureThreshold: 3
  periodSeconds: 10
  successThreshold: 1

我使用以下方法安装了图表:

helm install --name mongo-test -f values.yaml stable/mongodb-replicaset

helm 首先安装没有任何问题 - 安装期间没有错误消息:

NAME:   mongo-test
LAST DEPLOYED: Wed Aug 29 16:40:43 2018
NAMESPACE: b2c
STATUS: DEPLOYED

RESOURCES:
==> v1/Secret
NAME                                   TYPE               DATA  AGE
mongo-test-mongodb-replicaset-admin    Opaque             2     0s
mongo-test-mongodb-replicaset-ca       kubernetes.io/tls  2     0s
mongo-test-mongodb-replicaset-keyfile  Opaque             1     0s

==> v1/ConfigMap
NAME                                   DATA  AGE
mongo-test-mongodb-replicaset-init     1     0s
mongo-test-mongodb-replicaset-mongodb  1     0s
mongo-test-mongodb-replicaset-tests    1     0s

==> v1/Service
NAME                           TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)    AGE
mongo-test-mongodb-replicaset  ClusterIP  None        <none>       27018/TCP  0s

==> v1beta2/StatefulSet
NAME                           DESIRED  CURRENT  AGE
mongo-test-mongodb-replicaset  3        1        0s

==> v1/Pod(related)
NAME                             READY  STATUS   RESTARTS  AGE
mongo-test-mongodb-replicaset-0  0/1    Pending  0         0s

然后 1st Pod 正确启动,没有任何问题。 第二个 Pod 卡在 "Init:2/3"

NAME                                          READY     STATUS     RESTARTS   AGE
po/mongo-test-mongodb-replicaset-0            1/1       Running    0          5m
po/mongo-test-mongodb-replicaset-1            0/1       Init:2/3   0          5m

当我连接到 mongo-test-mongodb-replicaset-1 -c bootstrap 时,我可以在 /work-dir/log.txt 中找到以下内容:

mongodb@mongo-test-mongodb-replicaset-1:/work-dir$ more log.txt
[2018-08-29T14:41:51,684293796+00:00] [on-start.sh] Bootstrapping MongoDB replica set member: mongo-test-mongodb-replicaset-1
[2018-08-29T14:41:51,687394595+00:00] [on-start.sh] Reading standard input...
[2018-08-29T14:41:51,688594499+00:00] [on-start.sh] Generating certificate
[2018-08-29T14:41:51,951181683+00:00] [on-start.sh] Peers: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local
[2018-08-29T14:41:51,952080311+00:00] [on-start.sh] Starting a MongoDB instance...
[2018-08-29T14:41:51,953075555+00:00] [on-start.sh] Waiting for MongoDB to be ready...
2018-08-29T14:41:52.020+0000 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] MongoDB starting : pid=30 port=27017 dbpath=/data/db 64-bit host=mongo-test-mongodb-replicaset-1
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] db version v4.0.1
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] git version: 54f1582fc6eb01de4d4c42f26fc133e623f065fb
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] modules: none
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] build environment:
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] options: { config: "/data/configdb/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017, ssl: { CAFile: "/data/configdb/tls.crt", PEMKeyFile: "/work-dir/mongo.pem", mode: "requireSS
L" } }, replication: { replSet: "rs0" }, security: { authorization: "enabled", clusterAuthMode: "x509", keyFile: "/data/configdb/key.txt" }, storage: { dbPath: "/data/db" } }
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten]
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-08-29T14:41:52.048+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1337M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=jou
rnal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
[2018-08-29T14:41:52,083645436+00:00] [on-start.sh] Retrying...
2018-08-29T14:41:52.789+0000 I STORAGE  [initandlisten] WiredTiger message [1535553712:789699][30:0x7fd33c091a00], txn-recover: Set global recovery timestamp: 0
2018-08-29T14:41:52.800+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2018-08-29T14:41:52.819+0000 I STORAGE  [initandlisten] createCollection: local.startup_log with generated UUID: 2a15f25a-5f7b-47d3-b1a3-2338677428d0
2018-08-29T14:41:52.832+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2018-08-29T14:41:52.833+0000 I STORAGE  [initandlisten] createCollection: local.replset.oplogTruncateAfterPoint with generated UUID: ae87a21f-d2dc-4474-b4de-d70d95b7a2a8
2018-08-29T14:41:52.847+0000 I STORAGE  [initandlisten] createCollection: local.replset.minvalid with generated UUID: ffaf5c08-356c-4ed0-b4e0-1b3d9cdeea92
2018-08-29T14:41:52.866+0000 I REPL     [initandlisten] Did not find local voted for document at startup.
2018-08-29T14:41:52.866+0000 I REPL     [initandlisten] Did not find local Rollback ID document at startup. Creating one.
2018-08-29T14:41:52.866+0000 I STORAGE  [initandlisten] createCollection: local.system.rollback.id with generated UUID: 6e3e4fc7-b821-4df6-9c32-7db2af4a3bc4
2018-08-29T14:41:52.880+0000 I REPL     [initandlisten] Initialized the rollback ID to 1
2018-08-29T14:41:52.880+0000 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
2018-08-29T14:41:52.881+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2018-08-29T14:41:52.881+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Replication has not yet been configured
2018-08-29T14:41:52.881+0000 I NETWORK  [initandlisten] waiting for connections on port 27017 ssl
2018-08-29T14:41:54.148+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50022 #1 (1 connection now open)
2018-08-29T14:41:54.154+0000 I ACCESS   [conn1] note: no users configured in admin.system.users, allowing localhost access
2018-08-29T14:41:54.154+0000 I NETWORK  [conn1] received client metadata from 127.0.0.1:50022 conn1: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.157+0000 I ACCESS   [conn1] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("95760764-878a-4dd0-8ce6-470182c48a3a") } ], $db: "admin" }
2018-08-29T14:41:54.158+0000 I NETWORK  [conn1] end connection 127.0.0.1:50022 (0 connections now open)
[2018-08-29T14:41:54,162283712+00:00] [on-start.sh] Initialized.
[2018-08-29T14:41:54,267062979+00:00] [on-start.sh] Found master: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local
[2018-08-29T14:41:54,268004950+00:00] [on-start.sh] Adding myself (mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local) to replica set...
2018-08-29T14:41:54.368+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:50550 #2 (1 connection now open)
2018-08-29T14:41:54.371+0000 I ACCESS   [conn2]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:54.371+0000 I ACCESS   [conn2] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:50550 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-repli
caset-0@$external
2018-08-29T14:41:54.372+0000 I NETWORK  [conn2] end connection 10.43.67.35:50550 (0 connections now open)
2018-08-29T14:41:54.375+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:50552 #3 (1 connection now open)
2018-08-29T14:41:54.378+0000 I NETWORK  [conn3] received client metadata from 10.43.67.35:50552 conn3: { driver: { name: "NetworkInterfaceTL", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16
.04" } }
2018-08-29T14:41:54.379+0000 I ACCESS   [conn3]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:54.379+0000 I ACCESS   [conn3] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:50552 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-repli
caset-0@$external
[2018-08-29T14:41:57,388632734+00:00] [on-start.sh] Waiting for replica to reach SECONDARY state...
2018-08-29T14:41:57.441+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50032 #4 (2 connections now open)
2018-08-29T14:41:57.446+0000 I NETWORK  [conn4] received client metadata from 127.0.0.1:50032 conn4: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:57.448+0000 I ACCESS   [conn4] Supported SASL mechanisms requested for unknown user 'admin@admin'
2018-08-29T14:41:57.448+0000 I ACCESS   [conn4] SASL SCRAM-SHA-1 authentication failed for admin on admin from client 127.0.0.1:50032 ; UserNotFound: Could not find user admin@admin
2018-08-29T14:41:57.450+0000 I ACCESS   [conn4] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("38098891-2b34-46eb-aef2-53a69416671f") } ], $db: "admin" }
2018-08-29T14:41:57.451+0000 I NETWORK  [conn4] end connection 127.0.0.1:50032 (1 connection now open)
2018-08-29T14:41:58.504+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50034 #5 (2 connections now open)
2018-08-29T14:41:58.508+0000 I NETWORK  [conn5] received client metadata from 127.0.0.1:50034 conn5: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:58.509+0000 I ACCESS   [conn5] Supported SASL mechanisms requested for unknown user 'admin@admin'
2018-08-29T14:41:58.510+0000 I ACCESS   [conn5] SASL SCRAM-SHA-1 authentication failed for admin on admin from client 127.0.0.1:50034 ; UserNotFound: Could not find user admin@admin
2018-08-29T14:41:58.511+0000 I ACCESS   [conn5] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("e1089881-4c47-450c-9721-6b291c6f0e50") } ], $db: "admin" }
2018-08-29T14:41:58.512+0000 I NETWORK  [conn5] end connection 127.0.0.1:50034 (1 connection now open)
2018-08-29T14:41:59.574+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50036 #6 (2 connections now open)
2018-08-29T14:41:59.582+0000 I NETWORK  [conn6] received client metadata from 127.0.0.1:50036 conn6: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }

..未完待续...

...OP 的第二部分:

同时另一个 - 成功启动 - pod mongo-test-mongodb-replicaset-0 在其日志文件中有此条目:

2018-08-29T14:41:22.973+0000 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2018-08-29T14:41:22.980+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo-test-mongodb-replicaset-0
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] db version v4.0.1
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] git version: 54f1582fc6eb01de4d4c42f26fc133e623f065fb
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] modules: none
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] build environment:
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-08-29T14:41:22.981+0000 I CONTROL  [initandlisten] options: { config: "/data/configdb/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017, ssl: { CAFile: "/data/configdb/tls.crt", PEMKeyFile: "/work-dir/mongo.pem", mode: "requireSSL" } }, replication: { replSet: "rs0" }, security: { authorization: "enabled", clusterAuthMode: "x509", keyFile: "/data/configdb/key.txt" }, storage: { dbPath: "/data/db" } }
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten]
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-08-29T14:41:22.982+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1337M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-08-29T14:41:23.949+0000 I STORAGE  [initandlisten] WiredTiger message [1535553683:949243][1:0x7fee8d75da00], txn-recover: Main recovery loop: starting at 1/49280
2018-08-29T14:41:23.949+0000 I STORAGE  [initandlisten] WiredTiger message [1535553683:949915][1:0x7fee8d75da00], txn-recover: Recovering log 1 through 2
2018-08-29T14:41:24.043+0000 I STORAGE  [initandlisten] WiredTiger message [1535553684:43680][1:0x7fee8d75da00], txn-recover: Recovering log 2 through 2
2018-08-29T14:41:24.119+0000 I STORAGE  [initandlisten] WiredTiger message [1535553684:119960][1:0x7fee8d75da00], txn-recover: Set global recovery timestamp: 5b86b08600000004
2018-08-29T14:41:24.153+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1535553670, 4)
2018-08-29T14:41:24.153+0000 I STORAGE  [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1535553670, 4) PrevStable: Timestamp(0, 0) CurrStable: Timestamp(1535553670, 4)
2018-08-29T14:41:24.165+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-08-29T14:41:24.166+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 10 records totaling to 2325 bytes
2018-08-29T14:41:24.166+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2018-08-29T14:41:24.209+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2018-08-29T14:41:24.211+0000 I REPL     [initandlisten] Rollback ID is 1
2018-08-29T14:41:24.211+0000 I REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1535553670, 4) (top of oplog: { ts: Timestamp(1535553670, 4), t: 1 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2018-08-29T14:41:24.211+0000 I REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1535553670, 4)
2018-08-29T14:41:24.212+0000 I REPL     [initandlisten] No oplog entries to apply for recovery. Start point is at the top of the oplog.
2018-08-29T14:41:24.212+0000 I REPL     [replexec-0] New replica set config in use: { _id: "rs0", version: 1, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5b86b0825d695f321fd3c7e6') } }
2018-08-29T14:41:24.213+0000 I REPL     [replexec-0] This node is mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 in the config
2018-08-29T14:41:24.213+0000 I REPL     [replexec-0] transition to STARTUP2 from STARTUP
2018-08-29T14:41:24.213+0000 I NETWORK  [initandlisten] waiting for connections on port 27017 ssl
2018-08-29T14:41:24.213+0000 I REPL     [replexec-0] Starting replication storage threads
2018-08-29T14:41:24.214+0000 I REPL     [replexec-0] transition to RECOVERING from STARTUP2
2018-08-29T14:41:24.214+0000 I REPL     [replexec-0] Starting replication fetcher thread
2018-08-29T14:41:24.214+0000 I REPL     [replexec-0] Starting replication applier thread
2018-08-29T14:41:24.215+0000 I REPL     [replexec-0] Starting replication reporter thread
2018-08-29T14:41:24.214+0000 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for rs0/mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017
2018-08-29T14:41:24.215+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:37744 #2 (1 connection now open)
2018-08-29T14:41:24.216+0000 I REPL     [rsSync-0] Starting oplog application
2018-08-29T14:41:24.216+0000 I REPL     [rsSync-0] transition to SECONDARY from RECOVERING
2018-08-29T14:41:24.216+0000 I REPL     [rsSync-0] conducting a dry run election to see if we could be elected. current term: 1
2018-08-29T14:41:24.216+0000 I REPL     [replexec-0] dry election run succeeded, running for election in term 2
2018-08-29T14:41:24.218+0000 I REPL     [replexec-0] election succeeded, assuming primary role in term 2
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] transition to PRIMARY from SECONDARY
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] Entering primary catch-up mode.
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] Exited primary catch-up mode.
2018-08-29T14:41:24.219+0000 I REPL     [replexec-0] Stopping replication producer
2018-08-29T14:41:24.222+0000 I NETWORK  [conn2] received client metadata from 10.43.67.35:37744 conn2: { driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:24.222+0000 I NETWORK  [LogicalSessionCacheRefresh] Successfully connected to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 (1 connections now open to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 with a 5 second timeout)
2018-08-29T14:41:24.222+0000 W NETWORK  [LogicalSessionCacheRefresh] Unable to reach primary for set rs0
2018-08-29T14:41:24.723+0000 W NETWORK  [LogicalSessionCacheReap] Unable to reach primary for set rs0
2018-08-29T14:41:25.224+0000 W NETWORK  [LogicalSessionCacheReap] Unable to reach primary for set rs0
2018-08-29T14:41:25.724+0000 W NETWORK  [LogicalSessionCacheReap] Unable to reach primary for set rs0
2018-08-29T14:41:26.217+0000 I REPL     [rsSync-0] transition to primary complete; database writes are now permitted
2018-08-29T14:41:26.225+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:37748 #4 (2 connections now open)
2018-08-29T14:41:26.226+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:37750 #6 (3 connections now open)
2018-08-29T14:41:26.233+0000 I NETWORK  [conn6] received client metadata from 10.43.67.35:37750 conn6: { driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:26.233+0000 I NETWORK  [LogicalSessionCacheRefresh] Successfully connected to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 (1 connections now open to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 with a 0 second timeout)
2018-08-29T14:41:26.233+0000 I NETWORK  [conn4] received client metadata from 10.43.67.35:37748 conn4: { driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:26.234+0000 I NETWORK  [LogicalSessionCacheReap] Successfully connected to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 (2 connections now open to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 with a 0 second timeout)
2018-08-29T14:41:26.234+0000 I ACCESS   [conn6]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:26.235+0000 I ACCESS   [conn6] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:37750 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.235+0000 I NETWORK  [LogicalSessionCacheRefresh] scoped connection to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 not being returned to the pool
2018-08-29T14:41:26.235+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.235+0000 I ACCESS   [conn4]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:26.236+0000 I ACCESS   [conn4] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:37748 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.236+0000 I NETWORK  [LogicalSessionCacheReap] scoped connection to mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 not being returned to the pool
2018-08-29T14:41:26.236+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:26.236+0000 I NETWORK  [conn6] end connection 10.43.67.35:37750 (2 connections now open)
2018-08-29T14:41:26.236+0000 I NETWORK  [conn4] end connection 10.43.67.35:37748 (1 connection now open)
2018-08-29T14:41:30.295+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52056 #7 (2 connections now open)
2018-08-29T14:41:30.300+0000 I NETWORK  [conn7] received client metadata from 127.0.0.1:52056 conn7: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:30.304+0000 I NETWORK  [conn7] end connection 127.0.0.1:52056 (1 connection now open)
2018-08-29T14:41:40.300+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52072 #8 (2 connections now open)
2018-08-29T14:41:40.304+0000 I NETWORK  [conn8] received client metadata from 127.0.0.1:52072 conn8: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:40.308+0000 I NETWORK  [conn8] end connection 127.0.0.1:52072 (1 connection now open)
2018-08-29T14:41:50.293+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52090 #9 (2 connections now open)
2018-08-29T14:41:50.297+0000 I NETWORK  [conn9] received client metadata from 127.0.0.1:52090 conn9: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:50.301+0000 I NETWORK  [conn9] end connection 127.0.0.1:52090 (1 connection now open)
2018-08-29T14:41:54.219+0000 I NETWORK  [listener] connection accepted from 10.43.66.21:49738 #10 (2 connections now open)
2018-08-29T14:41:54.224+0000 I NETWORK  [conn10] received client metadata from 10.43.66.21:49738 conn10: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.256+0000 I ACCESS   [conn10] Successfully authenticated as principal admin on admin
2018-08-29T14:41:54.261+0000 I NETWORK  [conn10] end connection 10.43.66.21:49738 (1 connection now open)
2018-08-29T14:41:54.325+0000 I NETWORK  [listener] connection accepted from 10.43.66.21:49740 #11 (2 connections now open)
2018-08-29T14:41:54.330+0000 I NETWORK  [conn11] received client metadata from 10.43.66.21:49740 conn11: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.362+0000 I ACCESS   [conn11] Successfully authenticated as principal admin on admin
2018-08-29T14:41:54.366+0000 I REPL     [conn11] replSetReconfig admin command received from client; new config: { _id: "rs0", version: 2, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1.0, host: "mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5b86b0825d695f321fd3c7e6') } }
2018-08-29T14:41:54.372+0000 I NETWORK  [conn11] can't authenticate to mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 as internal user, error: Could not find user CN=mongo-test-mongodb-replicaset-0@$external
2018-08-29T14:41:54.372+0000 I REPL     [conn11] replSetReconfig config object with 2 members parses ok
2018-08-29T14:41:54.372+0000 I ASIO     [Replication] Connecting to mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017
2018-08-29T14:41:54.380+0000 W REPL     [replexec-1] Got error (Unauthorized: command replSetHeartbeat requires authentication) response on heartbeat request to mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017; { ok: 1.0, hbmsg: "" }
2018-08-29T14:41:54.380+0000 E REPL     [conn11] replSetReconfig failed; NodeNotFound: Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017; the following nodes did not respond affirmatively: mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local:27017 failed with command replSetHeartbeat requires authentication
2018-08-29T14:41:54.383+0000 I NETWORK  [conn11] end connection 10.43.66.21:49740 (1 connection now open)
2018-08-29T14:41:58.387+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52106 #14 (2 connections now open)
2018-08-29T14:41:58.391+0000 I NETWORK  [conn14] received client metadata from 127.0.0.1:52106 conn14: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:58.395+0000 I NETWORK  [conn14] end connection 127.0.0.1:52106 (1 connection now open)
2018-08-29T14:42:00.407+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:52118 #15 (2 connections now open)
2018-08-29T14:42:00.416+0000 I NETWORK  [conn15] received client metadata from 127.0.0.1:52118 conn15: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }

据我所知,一切顺利,直到第二个 mongo 实例尝试成为副本集的一部分 - 当主实例尝试连接到新的集群成员时,此连接失败,因此整个设置过程卡住

为了以防万一,遇到同样问题的人在这里跌跌撞撞:问题的根本原因是 helm chart 生成 x509 证书的方式。 基本上,证书主题行中的信息太少,无法满足 mongodb 的要求,因此证书不能用于副本集成员身份验证,这会导致观察到的错误并卡住整个设置过程。

好消息是,有一个解决方法 - 在 GitHub 上有描述:https://github.com/helm/charts/issues/7417#issuecomment-422293057

仍需在 helm 图表中修复。

Configure initContainer 通过将 volumePermissions.enabled 设置为 true。