IBM Cloud Private 2.1.0.1 ee 在安装 Monitoring 时因超时错误而失败
IBM Cloud Private 2.1.0.1 ee failed with a timeout error while installing Monitoring
我一直在尝试在单个节点中设置 ICP EE,但是一旦我开始部署监控服务任务,我总是遇到安装失败。
这个特定的任务运行了大约 30 分钟然后失败了。下面是我得到的错误日志。
我需要做哪些不同的事情吗?
为此,我使用了知识中心上的基本安装步骤。
TASK [monitoring : Deploying monitoring service]
*******************************
fatal: [localhost]: FAILED! => {
"changed":true,
"cmd":"kubectl apply --force --overwrite=true -f /installer/playbook/..//cluster/cfc-components/monitoring/",
"delta":"0:30:37.425771",
"end":"2018-02-26 17:19:04.780643",
"failed":true,
"rc":1,
"start":"2018-02-26 16:48:27.354872",
"stderr":"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout\nError from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)",
"stderr_lines":[
"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout",
"Error from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)"
],
"stdout":"configmap \"alert-rules\" created\nconfigmap \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-prometheus-alertmanager\" created\nconfigmap \"alertmanager-router-nginx-config\" created\nservice \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-exporter\" created\nservice \"monitoring-exporter\" created\nconfigmap \"monitoring-grafana-config\" created\ndeployment \"monitoring-grafana\" created\nconfigmap \"grafana-entry-config\" created\nservice \"monitoring-grafana\" created\njob \"monitoring-grafana-ds\" created\nconfigmap \"grafana-ds-entry-config\" created\nservice \"monitoring-prometheus-kubestatemetrics\" created\ndaemonset \"monitoring-prometheus-nodeexporter-amd64\" created\ndaemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created\ndaemonset \"monitoring-prometheus-nodeexporter-s390x\" created\nservice \"monitoring-prometheus-nodeexporter\" created\nconfigmap \"monitoring-prometheus\" created\ndeployment \"monitoring-prometheus\" created\nconfigmap \"prometheus-router-nginx-config\" created\nservice \"monitoring-prometheus\" created\nconfigmap \"monitoring-router-entry-config\" created",
"stdout_lines":[
"configmap \"alert-rules\" created",
"configmap \"monitoring-prometheus-alertmanager\" created",
"deployment \"monitoring-prometheus-alertmanager\" created",
"configmap \"alertmanager-router-nginx-config\" created",
"service \"monitoring-prometheus-alertmanager\" created",
"deployment \"monitoring-exporter\" created",
"service \"monitoring-exporter\" created",
"configmap \"monitoring-grafana-config\" created",
"deployment \"monitoring-grafana\" created",
"configmap \"grafana-entry-config\" created",
"service \"monitoring-grafana\" created",
"job \"monitoring-grafana-ds\" created",
"configmap \"grafana-ds-entry-config\" created",
"service \"monitoring-prometheus-kubestatemetrics\" created",
"daemonset \"monitoring-prometheus-nodeexporter-amd64\" created",
"daemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created",
"daemonset \"monitoring-prometheus-nodeexporter-s390x\" created",
"service \"monitoring-prometheus-nodeexporter\" created",
"configmap \"monitoring-prometheus\" created",
"deployment \"monitoring-prometheus\" created",
"configmap \"prometheus-router-nginx-config\" created",
"service \"monitoring-prometheus\" created",
"configmap \"monitoring-router-entry-config\" created"
]
}
这个节点至少有16G内存(甚至32G)吗?可能是主机因 pods 即将上线而被初始负载淹没。
要测试的第二件事是应用此目录时发生的情况:
您可以 re-run 从命令行执行相同的操作:
cd cluster/
kubectl apply --force --overwrite=true -f cfc-components/monitoring/
那你就可以反省幕后是怎么回事了:
kubectl -n kube-system get pod -o wide
- 是否 pods 卡在了 non-Running 状态?
- pods 中的容器是否未启动(例如显示 0/2 或 1/3 或类似)?
journalctl -ru kubelet -o cat | head -n 500 > kubelet-logs.txt
- kubelet 是否抱怨无法启动容器?
kubelet 是否抱怨 Docker 不健康?
如果某个 pod 表明它不健康(来自#1/#2 以上),则描述它并验证是否有任何事件表明它失败的原因:
kubectl -n kube-system describe pod [failing-pod-name]
如果您还没有在主机上配置kubectl
与系统交互,或者auth-idp
pod还没有部署,您可以使用以下步骤配置kubectl
:
- 将 kubectl 二进制文件复制到主机上,然后使用本地 kubelet 配置。您可以更新 shell 配置文件中的
KUBECONFIG
文件(例如 .bash_profile
),以便它适用于每个终端会话。
docker run -e LICENSE=accept -v /usr/local/bin:/data \
ibmcom/icp-inception:[YOUR_VERSION] \
cp /usr/local/bin/kubectl /data
export KUBECONFIG=/var/lib/kubelet/kubelet-config
我一直在尝试在单个节点中设置 ICP EE,但是一旦我开始部署监控服务任务,我总是遇到安装失败。
这个特定的任务运行了大约 30 分钟然后失败了。下面是我得到的错误日志。
我需要做哪些不同的事情吗?
为此,我使用了知识中心上的基本安装步骤。
TASK [monitoring : Deploying monitoring service]
*******************************
fatal: [localhost]: FAILED! => {
"changed":true,
"cmd":"kubectl apply --force --overwrite=true -f /installer/playbook/..//cluster/cfc-components/monitoring/",
"delta":"0:30:37.425771",
"end":"2018-02-26 17:19:04.780643",
"failed":true,
"rc":1,
"start":"2018-02-26 16:48:27.354872",
"stderr":"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout\nError from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)",
"stderr_lines":[
"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout",
"Error from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)"
],
"stdout":"configmap \"alert-rules\" created\nconfigmap \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-prometheus-alertmanager\" created\nconfigmap \"alertmanager-router-nginx-config\" created\nservice \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-exporter\" created\nservice \"monitoring-exporter\" created\nconfigmap \"monitoring-grafana-config\" created\ndeployment \"monitoring-grafana\" created\nconfigmap \"grafana-entry-config\" created\nservice \"monitoring-grafana\" created\njob \"monitoring-grafana-ds\" created\nconfigmap \"grafana-ds-entry-config\" created\nservice \"monitoring-prometheus-kubestatemetrics\" created\ndaemonset \"monitoring-prometheus-nodeexporter-amd64\" created\ndaemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created\ndaemonset \"monitoring-prometheus-nodeexporter-s390x\" created\nservice \"monitoring-prometheus-nodeexporter\" created\nconfigmap \"monitoring-prometheus\" created\ndeployment \"monitoring-prometheus\" created\nconfigmap \"prometheus-router-nginx-config\" created\nservice \"monitoring-prometheus\" created\nconfigmap \"monitoring-router-entry-config\" created",
"stdout_lines":[
"configmap \"alert-rules\" created",
"configmap \"monitoring-prometheus-alertmanager\" created",
"deployment \"monitoring-prometheus-alertmanager\" created",
"configmap \"alertmanager-router-nginx-config\" created",
"service \"monitoring-prometheus-alertmanager\" created",
"deployment \"monitoring-exporter\" created",
"service \"monitoring-exporter\" created",
"configmap \"monitoring-grafana-config\" created",
"deployment \"monitoring-grafana\" created",
"configmap \"grafana-entry-config\" created",
"service \"monitoring-grafana\" created",
"job \"monitoring-grafana-ds\" created",
"configmap \"grafana-ds-entry-config\" created",
"service \"monitoring-prometheus-kubestatemetrics\" created",
"daemonset \"monitoring-prometheus-nodeexporter-amd64\" created",
"daemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created",
"daemonset \"monitoring-prometheus-nodeexporter-s390x\" created",
"service \"monitoring-prometheus-nodeexporter\" created",
"configmap \"monitoring-prometheus\" created",
"deployment \"monitoring-prometheus\" created",
"configmap \"prometheus-router-nginx-config\" created",
"service \"monitoring-prometheus\" created",
"configmap \"monitoring-router-entry-config\" created"
]
}
这个节点至少有16G内存(甚至32G)吗?可能是主机因 pods 即将上线而被初始负载淹没。
要测试的第二件事是应用此目录时发生的情况:
您可以 re-run 从命令行执行相同的操作:
cd cluster/ kubectl apply --force --overwrite=true -f cfc-components/monitoring/
那你就可以反省幕后是怎么回事了:
kubectl -n kube-system get pod -o wide
- 是否 pods 卡在了 non-Running 状态?
- pods 中的容器是否未启动(例如显示 0/2 或 1/3 或类似)?
journalctl -ru kubelet -o cat | head -n 500 > kubelet-logs.txt
- kubelet 是否抱怨无法启动容器?
kubelet 是否抱怨 Docker 不健康?
如果某个 pod 表明它不健康(来自#1/#2 以上),则描述它并验证是否有任何事件表明它失败的原因:
kubectl -n kube-system describe pod [failing-pod-name]
如果您还没有在主机上配置kubectl
与系统交互,或者auth-idp
pod还没有部署,您可以使用以下步骤配置kubectl
:
- 将 kubectl 二进制文件复制到主机上,然后使用本地 kubelet 配置。您可以更新 shell 配置文件中的
KUBECONFIG
文件(例如.bash_profile
),以便它适用于每个终端会话。
docker run -e LICENSE=accept -v /usr/local/bin:/data \
ibmcom/icp-inception:[YOUR_VERSION] \
cp /usr/local/bin/kubectl /data
export KUBECONFIG=/var/lib/kubelet/kubelet-config