重启 Kubernetes 集群后 RabbitMQ 启动失败
RabbitMQ fails to start after restart Kubernetes cluster
我是 运行 Kubernetes 上的 RabbitMQ。这是我的 sts YAML 文件:
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-management
labels:
app: rabbitmq
spec:
ports:
- port: 15672
name: http
selector:
app: rabbitmq
type: NodePort
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
- port: 61613
name: stomp
clusterIP: None
selector:
app: rabbitmq
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
spec:
serviceName: "rabbitmq"
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
labels:
app: rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:management-alpine
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
rabbitmq-plugins enable rabbitmq_stomp;
if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi;
rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
env:
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-config
key: erlang-cookie
ports:
- containerPort: 5672
name: amqp
- containerPort: 61613
name: stomp
volumeMounts:
- name: rabbitmq
mountPath: /var/lib/rabbitmq
volumeClaimTemplates:
- metadata:
name: rabbitmq
annotations:
volume.alpha.kubernetes.io/storage-class: do-block-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
我用这个命令创建了 cookie:
kubectl create secret generic rabbitmq-config --from-literal=erlang-cookie=c-is-for-cookie-thats-good-enough-for-me
我所有的 Kubernetes 集群节点都已准备就绪:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master Ready master 14d v1.17.3
kubernetes-slave-1 Ready <none> 14d v1.17.3
kubernetes-slave-2 Ready <none> 14d v1.17.3
但是重启集群后,RabbitMQ并没有启动。我试图缩小和扩大 sts,但问题已经存在。 kubectl describe pod rabbitmq-0
的输出:
kubectl describe pod rabbitmq-0
Name: rabbitmq-0
Namespace: default
Priority: 0
Node: kubernetes-slave-1/192.168.0.179
Start Time: Tue, 24 Mar 2020 22:31:04 +0000
Labels: app=rabbitmq
controller-revision-hash=rabbitmq-6748869f4b
statefulset.kubernetes.io/pod-name=rabbitmq-0
Annotations: <none>
Status: Running
IP: 10.244.1.163
IPs:
IP: 10.244.1.163
Controlled By: StatefulSet/rabbitmq
Containers:
rabbitmq:
Container ID: docker://d5108f818525030b4fdb548eb40f0dc000dd2cec473ebf8cead315116e3efbd3
Image: rabbitmq:management-alpine
Image ID: docker-pullable://rabbitmq@sha256:6f7c8d01d55147713379f5ca26e3f20eca63eb3618c263b12440b31c697ee5a5
Ports: 5672/TCP, 61613/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: PostStartHookError: command '/bin/sh -c rabbitmq-plugins enable rabbitmq_stomp; if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
' exited with 137: Error: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-575-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.
Arguments given:
node_health_check
Usage
rabbitmqctl [--node <node>] [--longnames] [--quiet] node_health_check [--timeout <timeout>]
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"", :_, :_}, [], [:""]}]]}}
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-10397-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 24 Mar 2020 22:46:09 +0000
Finished: Tue, 24 Mar 2020 22:58:28 +0000
Ready: False
Restart Count: 1
Environment:
RABBITMQ_ERLANG_COOKIE: <set to the key 'erlang-cookie' in secret 'rabbitmq-config'> Optional: false
Mounts:
/var/lib/rabbitmq from rabbitmq (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bbl9c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rabbitmq:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: rabbitmq-rabbitmq-0
ReadOnly: false
default-token-bbl9c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bbl9c
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31m default-scheduler Successfully assigned default/rabbitmq-0 to kubernetes-slave-1
Normal Pulled 31m kubelet, kubernetes-slave-1 Container image "rabbitmq:management-alpine" already present on machine
Normal Created 31m kubelet, kubernetes-slave-1 Created container rabbitmq
Normal Started 31m kubelet, kubernetes-slave-1 Started container rabbitmq
Normal SandboxChanged 16m (x9 over 17m) kubelet, kubernetes-slave-1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 3m58s (x2 over 16m) kubelet, kubernetes-slave-1 Container image "rabbitmq:management-alpine" already present on machine
Warning FailedPostStartHook 3m58s kubelet, kubernetes-slave-1 Exec lifecycle hook ([/bin/sh -c rabbitmq-plugins enable rabbitmq_stomp; if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
]) for Container "rabbitmq" in Pod "rabbitmq-0_default(2e561153-a830-4d30-ab1e-71c80d10c9e9)" failed - error: command '/bin/sh -c rabbitmq-plugins enable rabbitmq_stomp; if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
' exited with 137: Error: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
other nodes on rabbitmq-0: [rabbitmqprelaunch1]
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-433-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-575-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.
Arguments given:
node_health_check
, message: "Enabling plugins on node rabbit@rabbitmq-0:\nrabbitmq_stomp\nThe following plugins have been configured:\n rabbitmq_management\n rabbitmq_management_agent\n rabbitmq_stomp\n rabbitmq_web_dispatch\nApplying plugin configuration to rabbit@rabbitmq-0...\nThe following plugins have been enabled:\n rabbitmq_stomp\n\nset 4 plugins.\nOffline change; changes will take effect at broker restart.\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\
Error:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node rabbit@rabbitmq-0\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@rabbitmq-0']\n\nrabbit@rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\nCurrent node details:\n * node name: 'rabbitmqcli-10397-rabbit@rabbitmq-0'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\n"
Normal Killing 3m58s kubelet, kubernetes-slave-1 FailedPostStartHook
Normal Created 3m57s (x2 over 16m) kubelet, kubernetes-slave-1 Created container rabbitmq
Normal Started 3m57s (x2 over 16m) kubelet, kubernetes-slave-1 Started container rabbitmq
kubectl get sts
的输出:
kubectl get sts
NAME READY AGE
consul 3/3 15d
hazelcast 2/3 15d
kafka 2/3 15d
rabbitmq 0/3 13d
zk 3/3 15d
这是我从 Kubernetes 仪表板复制的 pod 日志:
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: list of feature flags found:
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] drop_unroutable_metric
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] empty_basic_get_metric
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] implicit_default_bindings
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] quorum_queue
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] virtual_host_metadata
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-03-24 22:58:43.979 [info] <0.319.0> ra: meta data store initialised. 0 record(s) recovered
2020-03-24 22:58:43.980 [info] <0.324.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0/quorum/rabbit@rabbitmq-0/00000262.wal"]
2020-03-24 22:58:43.982 [info] <0.328.0>
Starting RabbitMQ 3.8.2 on Erlang 22.2.8
Copyright (c) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL 1.1. Website: https://rabbitmq.com
## ## RabbitMQ 3.8.2
## ##
########## Copyright (c) 2007-2019 Pivotal Software, Inc.
###### ##
########## Licensed under the MPL 1.1. Website: https://rabbitmq.com
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
Logs: <stdout>
Config file(s): /etc/rabbitmq/rabbitmq.conf
Starting broker...2020-03-24 22:58:43.983 [info] <0.328.0>
node : rabbit@rabbitmq-0
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : P1XNOe5pN3Ug2FCRFzH7Xg==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0
2020-03-24 22:58:43.997 [info] <0.328.0> Running boot step pre_boot defined by app rabbit
2020-03-24 22:58:43.997 [info] <0.328.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-03-24 22:58:43.998 [info] <0.328.0> Running boot step rabbit_alarm defined by app rabbit
2020-03-24 22:58:44.002 [info] <0.334.0> Memory high watermark set to 1200 MiB (1258889216 bytes) of 3001 MiB (3147223040 bytes) total
2020-03-24 22:58:44.014 [info] <0.336.0> Enabling free disk space monitoring
2020-03-24 22:58:44.014 [info] <0.336.0> Disk free limit set to 50MB
2020-03-24 22:58:44.018 [info] <0.328.0> Running boot step code_server_cache defined by app rabbit
2020-03-24 22:58:44.018 [info] <0.328.0> Running boot step file_handle_cache defined by app rabbit
2020-03-24 22:58:44.019 [info] <0.339.0> Limiting to approx 1048479 file handles (943629 sockets)
2020-03-24 22:58:44.019 [info] <0.340.0> FHC read buffering: OFF
2020-03-24 22:58:44.019 [info] <0.340.0> FHC write buffering: ON
2020-03-24 22:58:44.020 [info] <0.328.0> Running boot step worker_pool defined by app rabbit
2020-03-24 22:58:44.021 [info] <0.329.0> Will use 2 processes for default worker pool
2020-03-24 22:58:44.021 [info] <0.329.0> Starting worker pool 'worker_pool' with 2 processes in it
2020-03-24 22:58:44.021 [info] <0.328.0> Running boot step database defined by app rabbit
2020-03-24 22:58:44.041 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-03-24 22:59:14.042 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 22:59:14.042 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2020-03-24 22:59:44.043 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 22:59:44.043 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2020-03-24 23:00:14.044 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:00:14.044 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2020-03-24 23:00:44.045 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:00:44.045 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2020-03-24 23:01:14.046 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:01:14.046 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2020-03-24 23:01:44.047 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:01:44.047 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2020-03-24 23:02:14.048 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:02:14.048 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2020-03-24 23:02:44.049 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:02:44.049 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2020-03-24 23:03:14.050 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:03:14.050 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-03-24 23:03:44.051 [error] <0.328.0> Feature flag `quorum_queue`: migration function crashed: {error,{timeout_waiting_for_tables,[rabbit_durable_queue]}}
[{rabbit_table,wait,3,[{file,"src/rabbit_table.erl"},{line,117}]},{rabbit_core_ff,quorum_queue_migration,3,[{file,"src/rabbit_core_ff.erl"},{line,60}]},{rabbit_feature_flags,run_migration_fun,3,[{file,"src/rabbit_feature_flags.erl"},{line,1486}]},{rabbit_feature_flags,'-verify_which_feature_flags_are_actually_enabled/0-fun-2-',3,[{file,"src/rabbit_feature_flags.erl"},{line,2128}]},{maps,fold_1,3,[{file,"maps.erl"},{line,232}]},{rabbit_feature_flags,verify_which_feature_flags_are_actually_enabled,0,[{file,"src/rabbit_feature_flags.erl"},{line,2126}]},{rabbit_feature_flags,sync_feature_flags_with_cluster,3,[{file,"src/rabbit_feature_flags.erl"},{line,1947}]},{rabbit_mnesia,ensure_feature_flags_are_in_sync,2,[{file,"src/rabbit_mnesia.erl"},{line,631}]}]
2020-03-24 23:03:44.051 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-03-24 23:04:14.052 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:04:14.052 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2020-03-24 23:04:44.053 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:04:44.053 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2020-03-24 23:05:14.055 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:05:14.055 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2020-03-24 23:05:44.056 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:05:44.056 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2020-03-24 23:06:14.057 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:06:14.057 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2020-03-24 23:06:44.058 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:06:44.058 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2020-03-24 23:07:14.059 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:07:14.059 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2020-03-24 23:07:44.060 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:07:44.060 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2020-03-24 23:08:14.061 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:08:14.061 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-03-24 23:08:44.062 [error] <0.327.0> CRASH REPORT Process <0.327.0> with 0 neighbours exited with reason: {{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}} in application_master:init/4 line 138
2020-03-24 23:08:44.063 [info] <0.43.0> Application rabbit exited with reason: {{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_r
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
看看:
https://www.rabbitmq.com/clustering.html#restarting
您应该可以停止应用程序然后强制启动:
rabbitmqctl stop_app
rabbitmqctl force_boot
要完成@Vincent Gerris 的回答,我强烈建议您使用 Bitnami rabbitMQ Docker 图像。
他们包含了一个名为 RABBITMQ_FORCE_BOOT
:
的环境变量
if is_boolean_yes "$RABBITMQ_FORCE_BOOT" && ! is_dir_empty "${RABBITMQ_DATA_DIR}/${RABBITMQ_NODE_NAME}"; then
# ref: https://www.rabbitmq.com/rabbitmqctl.8.html#force_boot
warn "Forcing node to start..."
debug_execute "${RABBITMQ_BIN_DIR}/rabbitmqctl" force_boot
fi
这将强制在入口点启动节点。
我是 运行 Kubernetes 上的 RabbitMQ。这是我的 sts YAML 文件:
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-management
labels:
app: rabbitmq
spec:
ports:
- port: 15672
name: http
selector:
app: rabbitmq
type: NodePort
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
- port: 61613
name: stomp
clusterIP: None
selector:
app: rabbitmq
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
spec:
serviceName: "rabbitmq"
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
labels:
app: rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:management-alpine
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
rabbitmq-plugins enable rabbitmq_stomp;
if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi;
rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
env:
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-config
key: erlang-cookie
ports:
- containerPort: 5672
name: amqp
- containerPort: 61613
name: stomp
volumeMounts:
- name: rabbitmq
mountPath: /var/lib/rabbitmq
volumeClaimTemplates:
- metadata:
name: rabbitmq
annotations:
volume.alpha.kubernetes.io/storage-class: do-block-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
我用这个命令创建了 cookie:
kubectl create secret generic rabbitmq-config --from-literal=erlang-cookie=c-is-for-cookie-thats-good-enough-for-me
我所有的 Kubernetes 集群节点都已准备就绪:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master Ready master 14d v1.17.3
kubernetes-slave-1 Ready <none> 14d v1.17.3
kubernetes-slave-2 Ready <none> 14d v1.17.3
但是重启集群后,RabbitMQ并没有启动。我试图缩小和扩大 sts,但问题已经存在。 kubectl describe pod rabbitmq-0
的输出:
kubectl describe pod rabbitmq-0
Name: rabbitmq-0
Namespace: default
Priority: 0
Node: kubernetes-slave-1/192.168.0.179
Start Time: Tue, 24 Mar 2020 22:31:04 +0000
Labels: app=rabbitmq
controller-revision-hash=rabbitmq-6748869f4b
statefulset.kubernetes.io/pod-name=rabbitmq-0
Annotations: <none>
Status: Running
IP: 10.244.1.163
IPs:
IP: 10.244.1.163
Controlled By: StatefulSet/rabbitmq
Containers:
rabbitmq:
Container ID: docker://d5108f818525030b4fdb548eb40f0dc000dd2cec473ebf8cead315116e3efbd3
Image: rabbitmq:management-alpine
Image ID: docker-pullable://rabbitmq@sha256:6f7c8d01d55147713379f5ca26e3f20eca63eb3618c263b12440b31c697ee5a5
Ports: 5672/TCP, 61613/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: PostStartHookError: command '/bin/sh -c rabbitmq-plugins enable rabbitmq_stomp; if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
' exited with 137: Error: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-575-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.
Arguments given:
node_health_check
Usage
rabbitmqctl [--node <node>] [--longnames] [--quiet] node_health_check [--timeout <timeout>]
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"", :_, :_}, [], [:""]}]]}}
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-10397-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 24 Mar 2020 22:46:09 +0000
Finished: Tue, 24 Mar 2020 22:58:28 +0000
Ready: False
Restart Count: 1
Environment:
RABBITMQ_ERLANG_COOKIE: <set to the key 'erlang-cookie' in secret 'rabbitmq-config'> Optional: false
Mounts:
/var/lib/rabbitmq from rabbitmq (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bbl9c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rabbitmq:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: rabbitmq-rabbitmq-0
ReadOnly: false
default-token-bbl9c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bbl9c
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31m default-scheduler Successfully assigned default/rabbitmq-0 to kubernetes-slave-1
Normal Pulled 31m kubelet, kubernetes-slave-1 Container image "rabbitmq:management-alpine" already present on machine
Normal Created 31m kubelet, kubernetes-slave-1 Created container rabbitmq
Normal Started 31m kubelet, kubernetes-slave-1 Started container rabbitmq
Normal SandboxChanged 16m (x9 over 17m) kubelet, kubernetes-slave-1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 3m58s (x2 over 16m) kubelet, kubernetes-slave-1 Container image "rabbitmq:management-alpine" already present on machine
Warning FailedPostStartHook 3m58s kubelet, kubernetes-slave-1 Exec lifecycle hook ([/bin/sh -c rabbitmq-plugins enable rabbitmq_stomp; if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
]) for Container "rabbitmq" in Pod "rabbitmq-0_default(2e561153-a830-4d30-ab1e-71c80d10c9e9)" failed - error: command '/bin/sh -c rabbitmq-plugins enable rabbitmq_stomp; if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq. /" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
' exited with 137: Error: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
other nodes on rabbitmq-0: [rabbitmqprelaunch1]
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-433-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-0
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-575-rabbit@rabbitmq-0'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.
Arguments given:
node_health_check
, message: "Enabling plugins on node rabbit@rabbitmq-0:\nrabbitmq_stomp\nThe following plugins have been configured:\n rabbitmq_management\n rabbitmq_management_agent\n rabbitmq_stomp\n rabbitmq_web_dispatch\nApplying plugin configuration to rabbit@rabbitmq-0...\nThe following plugins have been enabled:\n rabbitmq_stomp\n\nset 4 plugins.\nOffline change; changes will take effect at broker restart.\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\nChecking health of node rabbit@rabbitmq-0 ...\nTimeout: 70 seconds ...\
Error:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError:\n{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :\"\", :_, :_}, [], [:\"\"]}]]}}\nError: unable to perform an operation on node 'rabbit@rabbitmq-0'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node rabbit@rabbitmq-0\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@rabbitmq-0']\n\nrabbit@rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\nCurrent node details:\n * node name: 'rabbitmqcli-10397-rabbit@rabbitmq-0'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\n"
Normal Killing 3m58s kubelet, kubernetes-slave-1 FailedPostStartHook
Normal Created 3m57s (x2 over 16m) kubelet, kubernetes-slave-1 Created container rabbitmq
Normal Started 3m57s (x2 over 16m) kubelet, kubernetes-slave-1 Started container rabbitmq
kubectl get sts
的输出:
kubectl get sts
NAME READY AGE
consul 3/3 15d
hazelcast 2/3 15d
kafka 2/3 15d
rabbitmq 0/3 13d
zk 3/3 15d
这是我从 Kubernetes 仪表板复制的 pod 日志:
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: list of feature flags found:
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] drop_unroutable_metric
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] empty_basic_get_metric
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] implicit_default_bindings
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] quorum_queue
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: [x] virtual_host_metadata
2020-03-24 22:58:41.402 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-03-24 22:58:43.979 [info] <0.319.0> ra: meta data store initialised. 0 record(s) recovered
2020-03-24 22:58:43.980 [info] <0.324.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0/quorum/rabbit@rabbitmq-0/00000262.wal"]
2020-03-24 22:58:43.982 [info] <0.328.0>
Starting RabbitMQ 3.8.2 on Erlang 22.2.8
Copyright (c) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL 1.1. Website: https://rabbitmq.com
## ## RabbitMQ 3.8.2
## ##
########## Copyright (c) 2007-2019 Pivotal Software, Inc.
###### ##
########## Licensed under the MPL 1.1. Website: https://rabbitmq.com
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
Logs: <stdout>
Config file(s): /etc/rabbitmq/rabbitmq.conf
Starting broker...2020-03-24 22:58:43.983 [info] <0.328.0>
node : rabbit@rabbitmq-0
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : P1XNOe5pN3Ug2FCRFzH7Xg==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-0
2020-03-24 22:58:43.997 [info] <0.328.0> Running boot step pre_boot defined by app rabbit
2020-03-24 22:58:43.997 [info] <0.328.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-03-24 22:58:43.998 [info] <0.328.0> Running boot step rabbit_alarm defined by app rabbit
2020-03-24 22:58:44.002 [info] <0.334.0> Memory high watermark set to 1200 MiB (1258889216 bytes) of 3001 MiB (3147223040 bytes) total
2020-03-24 22:58:44.014 [info] <0.336.0> Enabling free disk space monitoring
2020-03-24 22:58:44.014 [info] <0.336.0> Disk free limit set to 50MB
2020-03-24 22:58:44.018 [info] <0.328.0> Running boot step code_server_cache defined by app rabbit
2020-03-24 22:58:44.018 [info] <0.328.0> Running boot step file_handle_cache defined by app rabbit
2020-03-24 22:58:44.019 [info] <0.339.0> Limiting to approx 1048479 file handles (943629 sockets)
2020-03-24 22:58:44.019 [info] <0.340.0> FHC read buffering: OFF
2020-03-24 22:58:44.019 [info] <0.340.0> FHC write buffering: ON
2020-03-24 22:58:44.020 [info] <0.328.0> Running boot step worker_pool defined by app rabbit
2020-03-24 22:58:44.021 [info] <0.329.0> Will use 2 processes for default worker pool
2020-03-24 22:58:44.021 [info] <0.329.0> Starting worker pool 'worker_pool' with 2 processes in it
2020-03-24 22:58:44.021 [info] <0.328.0> Running boot step database defined by app rabbit
2020-03-24 22:58:44.041 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-03-24 22:59:14.042 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 22:59:14.042 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2020-03-24 22:59:44.043 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 22:59:44.043 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2020-03-24 23:00:14.044 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:00:14.044 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2020-03-24 23:00:44.045 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:00:44.045 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2020-03-24 23:01:14.046 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:01:14.046 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2020-03-24 23:01:44.047 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:01:44.047 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2020-03-24 23:02:14.048 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:02:14.048 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2020-03-24 23:02:44.049 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:02:44.049 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2020-03-24 23:03:14.050 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-03-24 23:03:14.050 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-03-24 23:03:44.051 [error] <0.328.0> Feature flag `quorum_queue`: migration function crashed: {error,{timeout_waiting_for_tables,[rabbit_durable_queue]}}
[{rabbit_table,wait,3,[{file,"src/rabbit_table.erl"},{line,117}]},{rabbit_core_ff,quorum_queue_migration,3,[{file,"src/rabbit_core_ff.erl"},{line,60}]},{rabbit_feature_flags,run_migration_fun,3,[{file,"src/rabbit_feature_flags.erl"},{line,1486}]},{rabbit_feature_flags,'-verify_which_feature_flags_are_actually_enabled/0-fun-2-',3,[{file,"src/rabbit_feature_flags.erl"},{line,2128}]},{maps,fold_1,3,[{file,"maps.erl"},{line,232}]},{rabbit_feature_flags,verify_which_feature_flags_are_actually_enabled,0,[{file,"src/rabbit_feature_flags.erl"},{line,2126}]},{rabbit_feature_flags,sync_feature_flags_with_cluster,3,[{file,"src/rabbit_feature_flags.erl"},{line,1947}]},{rabbit_mnesia,ensure_feature_flags_are_in_sync,2,[{file,"src/rabbit_mnesia.erl"},{line,631}]}]
2020-03-24 23:03:44.051 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-03-24 23:04:14.052 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:04:14.052 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2020-03-24 23:04:44.053 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:04:44.053 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2020-03-24 23:05:14.055 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:05:14.055 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2020-03-24 23:05:44.056 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:05:44.056 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2020-03-24 23:06:14.057 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:06:14.057 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2020-03-24 23:06:44.058 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:06:44.058 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2020-03-24 23:07:14.059 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:07:14.059 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2020-03-24 23:07:44.060 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:07:44.060 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2020-03-24 23:08:14.061 [warning] <0.328.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-03-24 23:08:14.061 [info] <0.328.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-03-24 23:08:44.062 [error] <0.327.0> CRASH REPORT Process <0.327.0> with 0 neighbours exited with reason: {{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}} in application_master:init/4 line 138
2020-03-24 23:08:44.063 [info] <0.43.0> Application rabbit exited with reason: {{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_r
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
看看: https://www.rabbitmq.com/clustering.html#restarting
您应该可以停止应用程序然后强制启动:
rabbitmqctl stop_app
rabbitmqctl force_boot
要完成@Vincent Gerris 的回答,我强烈建议您使用 Bitnami rabbitMQ Docker 图像。
他们包含了一个名为 RABBITMQ_FORCE_BOOT
:
if is_boolean_yes "$RABBITMQ_FORCE_BOOT" && ! is_dir_empty "${RABBITMQ_DATA_DIR}/${RABBITMQ_NODE_NAME}"; then
# ref: https://www.rabbitmq.com/rabbitmqctl.8.html#force_boot
warn "Forcing node to start..."
debug_execute "${RABBITMQ_BIN_DIR}/rabbitmqctl" force_boot
fi
这将强制在入口点启动节点。