Rabbit mq - 等待 Mnesia 表时出错
Rabbit mq - Error while waiting for Mnesia tables
我已经在 kubernetes 集群上使用 helm chart 安装了 rabbitmq。 rabbitmq pod 不断重启。在检查 pod 日志时,我收到以下错误
2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
当我尝试执行 kubectl describe pod 时出现此错误
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-rabbitmq-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-config
Optional: false
healthchecks:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-healthchecks
Optional: false
rabbitmq-token-w74kb:
Type: Secret (a volume populated by a Secret)
SecretName: rabbitmq-token-w74kb
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m27s (x878 over 7h21m) kubelet, gke-analytics-default-pool-918f5943-w0t0 Readiness probe failed: Timeout: 70 seconds ...
Checking health of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Status of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"", :_, :_}, [], [:""]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"", :_, :_}, [], [:""]}]]}}
我已经在 kubernetes 集群上的 Google Cloud 上配置了以上内容。我不确定它是在什么具体情况下开始失败的。我不得不重新启动 pod,从那以后它一直在失败。
这里有什么问题?
刚刚删除了现有的持久卷声明并重新安装了 rabbitmq,它开始工作了。
所以每次在 kubernetes 集群上安装 rabbitmq 之后,如果我将 pods 缩小到 0,当我稍后放大 pods 时,我会得到同样的错误。我还尝试在不卸载 rabbitmq helm chart 的情况下删除 Persistent Volume Claim,但仍然出现相同的错误。
看来每次我将集群缩小到0时,我都需要卸载rabbitmq helm chart,删除相应的Persistent Volume Claims并安装rabbitmq helm chart才能正常工作。
我也遇到了类似的错误,如下所示。
2020-06-05 03:45:37.153 [info] <0.234.0> Waiting for Mnesia tables for
30000 ms, 9 retries left 2020-06-05 03:46:07.154 [warning] <0.234.0>
Error while waiting for Mnesia tables:
{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-05 03:46:07.154 [info] <0.234.0> Waiting for Mnesia tables for
30000 ms, 8 retries left
在我的例子中,RabbitMQ 集群的从节点(服务器)宕机了。一旦我启动了从节点,主节点就没有错误地启动了。
测试此部署:
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq
labels:
app: rabbitmq
type: LoadBalancer
spec:
type: NodePort
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
nodePort: 31672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
nodePort: 30672
- name: stomp
protocol: TCP
port: 61613
targetPort: 61613
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq-lb
labels:
app: rabbitmq
spec:
# Headless service to give the StatefulSet a DNS which is known in the cluster (hostname-#.app.namespace.svc.cluster.local, )
# in our case - rabbitmq-#.rabbitmq.rabbitmq-namespace.svc.cluster.local
clusterIP: None
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
- name: stomp
port: 61613
selector:
app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: rabbitmq-namespace
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_stomp].
rabbitmq.conf: |
## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
## Should RabbitMQ node name be computed from the pod's hostname or IP address?
## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
## Set to "hostname" to use pod hostnames.
## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
## environment variable.
cluster_formation.k8s.address_type = hostname
## Important - this is the suffix of the hostname, as each node gets "rabbitmq-#", we need to tell what's the suffix
## it will give each new node that enters the way to contact the other peer node and join the cluster (if using hostname)
cluster_formation.k8s.hostname_suffix = .rabbitmq.rabbitmq-namespace.svc.cluster.local
## How often should node cleanup checks run?
cluster_formation.node_cleanup.interval = 30
## Set to false if automatic removal of unknown/absent nodes
## is desired. This can be dangerous, see
## * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
## * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
queue_master_locator=min-masters
## See http://www.rabbitmq.com/access-control.html#loopback-users
loopback_users.guest = false
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
spec:
serviceName: rabbitmq
replicas: 3
selector:
matchLabels:
name: rabbitmq
template:
metadata:
labels:
app: rabbitmq
name: rabbitmq
state: rabbitmq
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq-k8s
image: rabbitmq:3.8.3
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: data
mountPath: /var/lib/rabbitmq/mnesia
ports:
- name: http
protocol: TCP
containerPort: 15672
- name: amqp
protocol: TCP
containerPort: 5672
livenessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 10
resources:
requests:
memory: "0"
cpu: "0"
limits:
memory: "2048Mi"
cpu: "1000m"
readinessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
imagePullPolicy: Always
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RABBITMQ_USE_LONGNAME
value: "true"
# See a note on cluster_formation.k8s.address_type in the config file section
- name: RABBITMQ_NODENAME
value: "rabbit@$(HOSTNAME).rabbitmq.$(NAMESPACE).svc.cluster.local"
- name: K8S_SERVICE_NAME
value: "rabbitmq"
- name: RABBITMQ_ERLANG_COOKIE
value: "mycookie"
volumes:
- name: config-volume
configMap:
name: rabbitmq-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "default"
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
subjects:
- kind: ServiceAccount
name: rabbitmq
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: endpoint-reader
TLDR
helm upgrade rabbitmq --set clustering.forceBoot=true
问题
出现问题的原因如下:
- 所有 RMQ pods 由于某种原因同时终止(可能是因为您将 StatefulSet 副本显式设置为 0,或其他原因)
- 其中一个是最后一个停止的(可能只比其他人晚一点)。它将这个条件(“我现在是独立的”)存储在它的文件系统中,在 k8s 中是 PersistentVolume(Claim)。假设这个 pod 是 rabbitmq-1。
- 当您重新启动 StatefulSet 时,pod rabbitmq-0 始终是第一个启动的(参见 here)。
- 在启动期间,pod rabbitmq-0 首先检查它是否应该 运行 独立。但就它在自己的文件系统上所见而言,它是集群的一部分。所以它检查它的同行并没有找到任何。这个results in a startup failure by default.
- rabbitmq-0 因此永远不会就绪。
- rabbitmq-1 永远不会启动,因为这就是 StatefulSet 的部署方式 - 一个接一个。如果要启动,它会成功启动,因为它发现它也可以 运行 独立运行。
所以最后,RabbitMQ 和 StatefulSets 的工作方式有点不匹配。 RMQ 说:“如果一切都发生故障,只需同时启动一切,一个就可以启动,一旦这个启动,其他人就可以重新加入集群。” k8s StatefulSets 说:“不可能同时启动所有内容,我们将从 0 开始”。
解决方案
为了解决这个问题,有一个force_boot command for rabbitmqctl which basically tells an instance to start standalone if it doesn't find any peers. How you can use this from Kubernetes depends on the Helm chart and container you're using. In the Bitnami Chart, which uses the Bitnami Docker image,有一个值clustering.forceBoot = true
,转换为容器中的环境变量RABBITMQ_FORCE_BOOT = yes
,然后会发出上面的命令给你。
但是查看问题,您还可以了解为什么删除 PVC 会起作用 ()。 pods 将全部“忘记”他们上一次是 RMQ 集群的一部分,并愉快地开始。不过,我更喜欢上述解决方案,因为没有数据丢失。
在我的案例中,解决方案很简单
第一步:缩减statefulset,它不会删除PVC。
kubectl scale statefulsets rabbitmq-1-rabbitmq --namespace teps-rabbitmq --replicas=1
第 2 步:访问 RabbitMQ Pod。
kubectl exec -it rabbitmq-1-rabbitmq-0 -n Rabbit
第三步:重置集群
rabbitmqctl stop_app
rabbitmqctl force_boot
第 4 步:重新缩放 statefulset
kubectl scale statefulsets rabbitmq-1-rabbitmq --namespace teps-rabbitmq --replicas=4
如果你和我处于相同的场景,并且你不知道谁部署了 helm chart 以及它是如何部署的......你可以直接编辑 statefulset 以避免弄乱更多的东西..
我能够在不删除 helm_chart
的情况下使其工作
kubectl -n rabbitmq edit statefulsets.apps rabbitmq
在规范部分下,我添加了如下环境变量 RABBITMQ_FORCE_BOOT = yes:
spec:
containers:
- env:
- name: RABBITMQ_FORCE_BOOT # New Line 1 Added
value: "yes" # New Line 2 Added
这也应该可以解决问题...请首先按照 Ulli 上文所述以正确的方式进行操作。
我已经在 kubernetes 集群上使用 helm chart 安装了 rabbitmq。 rabbitmq pod 不断重启。在检查 pod 日志时,我收到以下错误
2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
当我尝试执行 kubectl describe pod 时出现此错误
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-rabbitmq-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-config
Optional: false
healthchecks:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-healthchecks
Optional: false
rabbitmq-token-w74kb:
Type: Secret (a volume populated by a Secret)
SecretName: rabbitmq-token-w74kb
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m27s (x878 over 7h21m) kubelet, gke-analytics-default-pool-918f5943-w0t0 Readiness probe failed: Timeout: 70 seconds ...
Checking health of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Status of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"", :_, :_}, [], [:""]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"", :_, :_}, [], [:""]}]]}}
我已经在 kubernetes 集群上的 Google Cloud 上配置了以上内容。我不确定它是在什么具体情况下开始失败的。我不得不重新启动 pod,从那以后它一直在失败。
这里有什么问题?
刚刚删除了现有的持久卷声明并重新安装了 rabbitmq,它开始工作了。
所以每次在 kubernetes 集群上安装 rabbitmq 之后,如果我将 pods 缩小到 0,当我稍后放大 pods 时,我会得到同样的错误。我还尝试在不卸载 rabbitmq helm chart 的情况下删除 Persistent Volume Claim,但仍然出现相同的错误。
看来每次我将集群缩小到0时,我都需要卸载rabbitmq helm chart,删除相应的Persistent Volume Claims并安装rabbitmq helm chart才能正常工作。
我也遇到了类似的错误,如下所示。
2020-06-05 03:45:37.153 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 9 retries left 2020-06-05 03:46:07.154 [warning] <0.234.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]} 2020-06-05 03:46:07.154 [info] <0.234.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
在我的例子中,RabbitMQ 集群的从节点(服务器)宕机了。一旦我启动了从节点,主节点就没有错误地启动了。
测试此部署:
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq
labels:
app: rabbitmq
type: LoadBalancer
spec:
type: NodePort
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
nodePort: 31672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
nodePort: 30672
- name: stomp
protocol: TCP
port: 61613
targetPort: 61613
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq-namespace
name: rabbitmq-lb
labels:
app: rabbitmq
spec:
# Headless service to give the StatefulSet a DNS which is known in the cluster (hostname-#.app.namespace.svc.cluster.local, )
# in our case - rabbitmq-#.rabbitmq.rabbitmq-namespace.svc.cluster.local
clusterIP: None
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
- name: stomp
port: 61613
selector:
app: rabbitmq
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: rabbitmq-namespace
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_stomp].
rabbitmq.conf: |
## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
## Should RabbitMQ node name be computed from the pod's hostname or IP address?
## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
## Set to "hostname" to use pod hostnames.
## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
## environment variable.
cluster_formation.k8s.address_type = hostname
## Important - this is the suffix of the hostname, as each node gets "rabbitmq-#", we need to tell what's the suffix
## it will give each new node that enters the way to contact the other peer node and join the cluster (if using hostname)
cluster_formation.k8s.hostname_suffix = .rabbitmq.rabbitmq-namespace.svc.cluster.local
## How often should node cleanup checks run?
cluster_formation.node_cleanup.interval = 30
## Set to false if automatic removal of unknown/absent nodes
## is desired. This can be dangerous, see
## * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
## * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
queue_master_locator=min-masters
## See http://www.rabbitmq.com/access-control.html#loopback-users
loopback_users.guest = false
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
spec:
serviceName: rabbitmq
replicas: 3
selector:
matchLabels:
name: rabbitmq
template:
metadata:
labels:
app: rabbitmq
name: rabbitmq
state: rabbitmq
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq-k8s
image: rabbitmq:3.8.3
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: data
mountPath: /var/lib/rabbitmq/mnesia
ports:
- name: http
protocol: TCP
containerPort: 15672
- name: amqp
protocol: TCP
containerPort: 5672
livenessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 10
resources:
requests:
memory: "0"
cpu: "0"
limits:
memory: "2048Mi"
cpu: "1000m"
readinessProbe:
exec:
command: ["rabbitmqctl", "status"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
imagePullPolicy: Always
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RABBITMQ_USE_LONGNAME
value: "true"
# See a note on cluster_formation.k8s.address_type in the config file section
- name: RABBITMQ_NODENAME
value: "rabbit@$(HOSTNAME).rabbitmq.$(NAMESPACE).svc.cluster.local"
- name: K8S_SERVICE_NAME
value: "rabbitmq"
- name: RABBITMQ_ERLANG_COOKIE
value: "mycookie"
volumes:
- name: config-volume
configMap:
name: rabbitmq-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "default"
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: rabbitmq-namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: endpoint-reader
namespace: rabbitmq-namespace
subjects:
- kind: ServiceAccount
name: rabbitmq
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: endpoint-reader
TLDR
helm upgrade rabbitmq --set clustering.forceBoot=true
问题
出现问题的原因如下:
- 所有 RMQ pods 由于某种原因同时终止(可能是因为您将 StatefulSet 副本显式设置为 0,或其他原因)
- 其中一个是最后一个停止的(可能只比其他人晚一点)。它将这个条件(“我现在是独立的”)存储在它的文件系统中,在 k8s 中是 PersistentVolume(Claim)。假设这个 pod 是 rabbitmq-1。
- 当您重新启动 StatefulSet 时,pod rabbitmq-0 始终是第一个启动的(参见 here)。
- 在启动期间,pod rabbitmq-0 首先检查它是否应该 运行 独立。但就它在自己的文件系统上所见而言,它是集群的一部分。所以它检查它的同行并没有找到任何。这个results in a startup failure by default.
- rabbitmq-0 因此永远不会就绪。
- rabbitmq-1 永远不会启动,因为这就是 StatefulSet 的部署方式 - 一个接一个。如果要启动,它会成功启动,因为它发现它也可以 运行 独立运行。
所以最后,RabbitMQ 和 StatefulSets 的工作方式有点不匹配。 RMQ 说:“如果一切都发生故障,只需同时启动一切,一个就可以启动,一旦这个启动,其他人就可以重新加入集群。” k8s StatefulSets 说:“不可能同时启动所有内容,我们将从 0 开始”。
解决方案
为了解决这个问题,有一个force_boot command for rabbitmqctl which basically tells an instance to start standalone if it doesn't find any peers. How you can use this from Kubernetes depends on the Helm chart and container you're using. In the Bitnami Chart, which uses the Bitnami Docker image,有一个值clustering.forceBoot = true
,转换为容器中的环境变量RABBITMQ_FORCE_BOOT = yes
,然后会发出上面的命令给你。
但是查看问题,您还可以了解为什么删除 PVC 会起作用 (
在我的案例中,解决方案很简单
第一步:缩减statefulset,它不会删除PVC。
kubectl scale statefulsets rabbitmq-1-rabbitmq --namespace teps-rabbitmq --replicas=1
第 2 步:访问 RabbitMQ Pod。
kubectl exec -it rabbitmq-1-rabbitmq-0 -n Rabbit
第三步:重置集群
rabbitmqctl stop_app
rabbitmqctl force_boot
第 4 步:重新缩放 statefulset
kubectl scale statefulsets rabbitmq-1-rabbitmq --namespace teps-rabbitmq --replicas=4
如果你和我处于相同的场景,并且你不知道谁部署了 helm chart 以及它是如何部署的......你可以直接编辑 statefulset 以避免弄乱更多的东西..
我能够在不删除 helm_chart
的情况下使其工作kubectl -n rabbitmq edit statefulsets.apps rabbitmq
在规范部分下,我添加了如下环境变量 RABBITMQ_FORCE_BOOT = yes:
spec:
containers:
- env:
- name: RABBITMQ_FORCE_BOOT # New Line 1 Added
value: "yes" # New Line 2 Added
这也应该可以解决问题...请首先按照 Ulli 上文所述以正确的方式进行操作。