Kube-Prometheus-Stack Helm Chart v14.40:Node-exporter 和 scrape 目标在 Docker For Mac macOS Catalina 10.15.7 上的 Kubernetes 集群中不健康
Kube-Prometheus-Stack Helm Chart v14.40 : Node-exporter and scrape targets unhealthy in Docker For Mac Kubernetes Cluster on macOS Catalina 10.15.7
我已将 kube-prometheus-stack 作为 依赖项 安装在本地 Docker 的 helm 图表中,用于 Mac Kubernetes 集群 v1.19.7。
myrelease-name-prometheus-node-exporter 服务失败,在安装 kube-prometheus-stack 的 helm chart 后从 node-exporter daemonset 收到错误安装。它安装在 Docker 桌面上,用于 Mac Kubernetes 集群环境。
release-name-prometheus-node-exporter daemonset 错误日志
MountVolume.SetUp failed for volume "flaskapi-prometheus-node-exporter-token-zft28" : failed to sync secret cache: timed out waiting for the condition
Error: failed to start container "node-exporter": Error response from daemon: path / is mounted on / but it is not a shared or slave mount
Back-off restarting failed container
kube-scheduler:http://192.168.65.4:10251/metrics
、kube-proxy:http://192.168.65.4:10249/metrics
、kube-etcd:http://192.168.65.4:2379/metrics
、kube-controller-manager:http://192.168.65.4:10252/metrics
和 node-exporter:http://192.168.65.4:9100/metrics
的抓取目标被标记为不健康。所有显示为 connection refused
,除了显示 connection reset by peer
.
的 kube-etcd
Chart.yaml
apiVersion: v2
appVersion: "0.0.1"
description: A Helm chart for flaskapi deployment
name: flaskapi
version: 0.0.1
dependencies:
- name: kube-prometheus-stack
version: "14.4.0"
repository: "https://prometheus-community.github.io/helm-charts"
- name: ingress-nginx
version: "3.25.0"
repository: "https://kubernetes.github.io/ingress-nginx"
- name: redis
version: "12.9.0"
repository: "https://charts.bitnami.com/bitnami"
Values.yaml
hostname: flaskapi-service
redis_host: flaskapi-redis-master.default.svc.cluster.local
redis_port: "6379"
环境
Mac OS 卡特琳娜 10.15.7
Docker Desktop For Mac 3.2.2(61853) with docker engine v20.10.5
Docker Desktop For Mac
提供的本地 Kubernetes 1.19.7 集群
Prometheus Operator 版本:
kube-prometheus-stack 14.4.0
Kubernetes版本信息:
kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
kubectl 获取全部
NAME READY STATUS RESTARTS AGE
pod/alertmanager-flaskapi-kube-prometheus-s-alertmanager-0 2/2 Running 0 16m
pod/flask-deployment-775fcf8ff-2hp9s 1/1 Running 0 16m
pod/flask-deployment-775fcf8ff-4qdjn 1/1 Running 0 16m
pod/flask-deployment-775fcf8ff-6bvmv 1/1 Running 0 16m
pod/flaskapi-grafana-6cb58f6656-77rqk 2/2 Running 0 16m
pod/flaskapi-ingress-nginx-controller-ccfc7b6df-qvl7d 1/1 Running 0 16m
pod/flaskapi-kube-prometheus-s-operator-69f4bcf865-tq4q2 1/1 Running 0 16m
pod/flaskapi-kube-state-metrics-67c7f5f854-hbr27 1/1 Running 0 16m
pod/flaskapi-prometheus-node-exporter-7hgnm 0/1 CrashLoopBackOff 8 16m
pod/flaskapi-redis-master-0 1/1 Running 0 16m
pod/flaskapi-redis-slave-0 1/1 Running 0 16m
pod/flaskapi-redis-slave-1 1/1 Running 0 15m
pod/prometheus-flaskapi-kube-prometheus-s-prometheus-0 2/2 Running 0 16m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 16m
service/flask-api-service ClusterIP 10.108.242.86 <none> 4444/TCP 16m
service/flaskapi-grafana ClusterIP 10.98.186.112 <none> 80/TCP 16m
service/flaskapi-ingress-nginx-controller LoadBalancer 10.102.217.51 localhost 80:30347/TCP,443:31422/TCP 16m
service/flaskapi-ingress-nginx-controller-admission ClusterIP 10.99.21.136 <none> 443/TCP 16m
service/flaskapi-kube-prometheus-s-alertmanager ClusterIP 10.107.215.73 <none> 9093/TCP 16m
service/flaskapi-kube-prometheus-s-operator ClusterIP 10.107.162.227 <none> 443/TCP 16m
service/flaskapi-kube-prometheus-s-prometheus ClusterIP 10.96.168.75 <none> 9090/TCP 16m
service/flaskapi-kube-state-metrics ClusterIP 10.100.118.21 <none> 8080/TCP 16m
service/flaskapi-prometheus-node-exporter ClusterIP 10.97.61.162 <none> 9100/TCP 16m
service/flaskapi-redis-headless ClusterIP None <none> 6379/TCP 16m
service/flaskapi-redis-master ClusterIP 10.96.192.160 <none> 6379/TCP 16m
service/flaskapi-redis-slave ClusterIP 10.107.119.108 <none> 6379/TCP 16m
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d1h
service/prometheus-operated ClusterIP None <none> 9090/TCP 16m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/flaskapi-prometheus-node-exporter 1 1 0 1 0 <none> 16m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/flask-deployment 3/3 3 3 16m
deployment.apps/flaskapi-grafana 1/1 1 1 16m
deployment.apps/flaskapi-ingress-nginx-controller 1/1 1 1 16m
deployment.apps/flaskapi-kube-prometheus-s-operator 1/1 1 1 16m
deployment.apps/flaskapi-kube-state-metrics 1/1 1 1 16m
NAME DESIRED CURRENT READY AGE
replicaset.apps/flask-deployment-775fcf8ff 3 3 3 16m
replicaset.apps/flaskapi-grafana-6cb58f6656 1 1 1 16m
replicaset.apps/flaskapi-ingress-nginx-controller-ccfc7b6df 1 1 1 16m
replicaset.apps/flaskapi-kube-prometheus-s-operator-69f4bcf865 1 1 1 16m
replicaset.apps/flaskapi-kube-state-metrics-67c7f5f854 1 1 1 16m
NAME READY AGE
statefulset.apps/alertmanager-flaskapi-kube-prometheus-s-alertmanager 1/1 16m
statefulset.apps/flaskapi-redis-master 1/1 16m
statefulset.apps/flaskapi-redis-slave 2/2 16m
statefulset.apps/prometheus-flaskapi-kube-prometheus-s-prometheus 1/1 16m
kubectl get svc -n kube-system
flaskapi-kube-prometheus-s-coredns ClusterIP None <none> 9153/TCP 29s
flaskapi-kube-prometheus-s-kube-controller-manager ClusterIP None <none> 10252/TCP 29s
flaskapi-kube-prometheus-s-kube-etcd ClusterIP None <none> 2379/TCP 29s
flaskapi-kube-prometheus-s-kube-proxy ClusterIP None <none> 10249/TCP 29s
flaskapi-kube-prometheus-s-kube-scheduler ClusterIP None <none> 10251/TCP 29s
flaskapi-kube-prometheus-s-kubelet ClusterIP None <none> 10250/TCP,10255/TCP,4194/TCP 2d18h
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 5d18h
尝试更新 values.yaml 以包含此内容:
已更新values.yaml
prometheus-node-exporter:
hostRootFsMount: false
还有这个:
prometheus:
prometheus-node-exporter:
hostRootFsMount: false
但是,所描述的问题仍然存在,除了 node-exporter daemonset 的日志现在给出:
failed to try resolving symlinks in path "/var/log/pods/default_flaskapi-prometheus-node-exporter-p5cc8_54c20fc6-c914-4cc6-b441-07b68cda140e/node-exporter/3.log": lstat /var/log/pods/default_flaskapi-prometheus-node-exporter-p5cc8_54c20fc6-c914-4cc6-b441-07b68cda140e/node-exporter/3.log: no such file or directory
根据评论建议更新信息
kubectl get pod flaskapi-prometheus-node-exporter-p5cc8
自节点导出器崩溃以来没有可用的参数...
NAME READY STATUS RESTARTS AGE
flaskapi-prometheus-node-exporter-p5cc8 0/1 CrashLoopBackOff 7 14m
kubectl describe pod flaskapi-prometheus-node-exporter-p5cc8
的 yaml 输出中的 Args 给出:
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
--path.rootfs=/host/root
--web.listen-address=$(HOST_IP):9100
--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
更新 values.yaml 以包含根 kube-prometheus-stack
后,如答案评论中所建议,允许 prometheus-node-exporter daemonset 成功启动。但是,上面提到的抓取目标仍然不可用....
kube-prometheus-stack:
prometheus-node-exporter:
hostRootFsMount: false
如何让节点导出器工作并使相关的抓取目标健康?
kube-prometheus-stack helm chart 的节点导出器是否与 Docker Desktop for Mac Kubernetes 集群不兼容?
我在 kube-prometheus-stack 中将其作为 issue 提出,其中包含用于 docker-desktop
和 minikube
集群的抓取目标的日志输出。
结论
看起来不可用的抓取目标是带有 kube-prometheus-stack 的 problem/bug。我在他们的 GitHub 页面上搜索并发现了类似的问题:713 and 718。在带有 hyperkit vm-driver 的 minikube 集群上进行了尝试。在 minikube 上,节点导出器开箱即用,但抓取目标问题仍然存在。不确定什么是安全的解决方案?
我可能会调查 prometheus 和 grafana 的替代 helm chart 依赖项...
这个问题最近已经解决了。这里有更多信息:https://github.com/prometheus-community/helm-charts/issues/467 and here: https://github.com/prometheus-community/helm-charts/pull/757
这是解决方案 (https://github.com/prometheus-community/helm-charts/issues/467#issuecomment-802642666):
[you need to] opt-out the rootfs host mount (preventing the crash). In order to do that you need to specify the following value in values.yaml file:
prometheus-node-exporter:
hostRootFsMount: false
我已将 kube-prometheus-stack 作为 依赖项 安装在本地 Docker 的 helm 图表中,用于 Mac Kubernetes 集群 v1.19.7。
myrelease-name-prometheus-node-exporter 服务失败,在安装 kube-prometheus-stack 的 helm chart 后从 node-exporter daemonset 收到错误安装。它安装在 Docker 桌面上,用于 Mac Kubernetes 集群环境。
release-name-prometheus-node-exporter daemonset 错误日志
MountVolume.SetUp failed for volume "flaskapi-prometheus-node-exporter-token-zft28" : failed to sync secret cache: timed out waiting for the condition
Error: failed to start container "node-exporter": Error response from daemon: path / is mounted on / but it is not a shared or slave mount
Back-off restarting failed container
kube-scheduler:http://192.168.65.4:10251/metrics
、kube-proxy:http://192.168.65.4:10249/metrics
、kube-etcd:http://192.168.65.4:2379/metrics
、kube-controller-manager:http://192.168.65.4:10252/metrics
和 node-exporter:http://192.168.65.4:9100/metrics
的抓取目标被标记为不健康。所有显示为 connection refused
,除了显示 connection reset by peer
.
kube-etcd
Chart.yaml
apiVersion: v2
appVersion: "0.0.1"
description: A Helm chart for flaskapi deployment
name: flaskapi
version: 0.0.1
dependencies:
- name: kube-prometheus-stack
version: "14.4.0"
repository: "https://prometheus-community.github.io/helm-charts"
- name: ingress-nginx
version: "3.25.0"
repository: "https://kubernetes.github.io/ingress-nginx"
- name: redis
version: "12.9.0"
repository: "https://charts.bitnami.com/bitnami"
Values.yaml
hostname: flaskapi-service
redis_host: flaskapi-redis-master.default.svc.cluster.local
redis_port: "6379"
环境 Mac OS 卡特琳娜 10.15.7 Docker Desktop For Mac 3.2.2(61853) with docker engine v20.10.5 Docker Desktop For Mac
提供的本地 Kubernetes 1.19.7 集群Prometheus Operator 版本:
kube-prometheus-stack 14.4.0
Kubernetes版本信息:
kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
kubectl 获取全部
NAME READY STATUS RESTARTS AGE
pod/alertmanager-flaskapi-kube-prometheus-s-alertmanager-0 2/2 Running 0 16m
pod/flask-deployment-775fcf8ff-2hp9s 1/1 Running 0 16m
pod/flask-deployment-775fcf8ff-4qdjn 1/1 Running 0 16m
pod/flask-deployment-775fcf8ff-6bvmv 1/1 Running 0 16m
pod/flaskapi-grafana-6cb58f6656-77rqk 2/2 Running 0 16m
pod/flaskapi-ingress-nginx-controller-ccfc7b6df-qvl7d 1/1 Running 0 16m
pod/flaskapi-kube-prometheus-s-operator-69f4bcf865-tq4q2 1/1 Running 0 16m
pod/flaskapi-kube-state-metrics-67c7f5f854-hbr27 1/1 Running 0 16m
pod/flaskapi-prometheus-node-exporter-7hgnm 0/1 CrashLoopBackOff 8 16m
pod/flaskapi-redis-master-0 1/1 Running 0 16m
pod/flaskapi-redis-slave-0 1/1 Running 0 16m
pod/flaskapi-redis-slave-1 1/1 Running 0 15m
pod/prometheus-flaskapi-kube-prometheus-s-prometheus-0 2/2 Running 0 16m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 16m
service/flask-api-service ClusterIP 10.108.242.86 <none> 4444/TCP 16m
service/flaskapi-grafana ClusterIP 10.98.186.112 <none> 80/TCP 16m
service/flaskapi-ingress-nginx-controller LoadBalancer 10.102.217.51 localhost 80:30347/TCP,443:31422/TCP 16m
service/flaskapi-ingress-nginx-controller-admission ClusterIP 10.99.21.136 <none> 443/TCP 16m
service/flaskapi-kube-prometheus-s-alertmanager ClusterIP 10.107.215.73 <none> 9093/TCP 16m
service/flaskapi-kube-prometheus-s-operator ClusterIP 10.107.162.227 <none> 443/TCP 16m
service/flaskapi-kube-prometheus-s-prometheus ClusterIP 10.96.168.75 <none> 9090/TCP 16m
service/flaskapi-kube-state-metrics ClusterIP 10.100.118.21 <none> 8080/TCP 16m
service/flaskapi-prometheus-node-exporter ClusterIP 10.97.61.162 <none> 9100/TCP 16m
service/flaskapi-redis-headless ClusterIP None <none> 6379/TCP 16m
service/flaskapi-redis-master ClusterIP 10.96.192.160 <none> 6379/TCP 16m
service/flaskapi-redis-slave ClusterIP 10.107.119.108 <none> 6379/TCP 16m
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d1h
service/prometheus-operated ClusterIP None <none> 9090/TCP 16m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/flaskapi-prometheus-node-exporter 1 1 0 1 0 <none> 16m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/flask-deployment 3/3 3 3 16m
deployment.apps/flaskapi-grafana 1/1 1 1 16m
deployment.apps/flaskapi-ingress-nginx-controller 1/1 1 1 16m
deployment.apps/flaskapi-kube-prometheus-s-operator 1/1 1 1 16m
deployment.apps/flaskapi-kube-state-metrics 1/1 1 1 16m
NAME DESIRED CURRENT READY AGE
replicaset.apps/flask-deployment-775fcf8ff 3 3 3 16m
replicaset.apps/flaskapi-grafana-6cb58f6656 1 1 1 16m
replicaset.apps/flaskapi-ingress-nginx-controller-ccfc7b6df 1 1 1 16m
replicaset.apps/flaskapi-kube-prometheus-s-operator-69f4bcf865 1 1 1 16m
replicaset.apps/flaskapi-kube-state-metrics-67c7f5f854 1 1 1 16m
NAME READY AGE
statefulset.apps/alertmanager-flaskapi-kube-prometheus-s-alertmanager 1/1 16m
statefulset.apps/flaskapi-redis-master 1/1 16m
statefulset.apps/flaskapi-redis-slave 2/2 16m
statefulset.apps/prometheus-flaskapi-kube-prometheus-s-prometheus 1/1 16m
kubectl get svc -n kube-system
flaskapi-kube-prometheus-s-coredns ClusterIP None <none> 9153/TCP 29s
flaskapi-kube-prometheus-s-kube-controller-manager ClusterIP None <none> 10252/TCP 29s
flaskapi-kube-prometheus-s-kube-etcd ClusterIP None <none> 2379/TCP 29s
flaskapi-kube-prometheus-s-kube-proxy ClusterIP None <none> 10249/TCP 29s
flaskapi-kube-prometheus-s-kube-scheduler ClusterIP None <none> 10251/TCP 29s
flaskapi-kube-prometheus-s-kubelet ClusterIP None <none> 10250/TCP,10255/TCP,4194/TCP 2d18h
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 5d18h
尝试更新 values.yaml 以包含此内容:
已更新values.yaml
prometheus-node-exporter:
hostRootFsMount: false
还有这个:
prometheus:
prometheus-node-exporter:
hostRootFsMount: false
但是,所描述的问题仍然存在,除了 node-exporter daemonset 的日志现在给出:
failed to try resolving symlinks in path "/var/log/pods/default_flaskapi-prometheus-node-exporter-p5cc8_54c20fc6-c914-4cc6-b441-07b68cda140e/node-exporter/3.log": lstat /var/log/pods/default_flaskapi-prometheus-node-exporter-p5cc8_54c20fc6-c914-4cc6-b441-07b68cda140e/node-exporter/3.log: no such file or directory
根据评论建议更新信息
kubectl get pod flaskapi-prometheus-node-exporter-p5cc8
自节点导出器崩溃以来没有可用的参数...
NAME READY STATUS RESTARTS AGE
flaskapi-prometheus-node-exporter-p5cc8 0/1 CrashLoopBackOff 7 14m
kubectl describe pod flaskapi-prometheus-node-exporter-p5cc8
的 yaml 输出中的 Args 给出:
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
--path.rootfs=/host/root
--web.listen-address=$(HOST_IP):9100
--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
更新 values.yaml 以包含根 kube-prometheus-stack
后,如答案评论中所建议,允许 prometheus-node-exporter daemonset 成功启动。但是,上面提到的抓取目标仍然不可用....
kube-prometheus-stack:
prometheus-node-exporter:
hostRootFsMount: false
如何让节点导出器工作并使相关的抓取目标健康?
kube-prometheus-stack helm chart 的节点导出器是否与 Docker Desktop for Mac Kubernetes 集群不兼容?
我在 kube-prometheus-stack 中将其作为 issue 提出,其中包含用于 docker-desktop
和 minikube
集群的抓取目标的日志输出。
结论 看起来不可用的抓取目标是带有 kube-prometheus-stack 的 problem/bug。我在他们的 GitHub 页面上搜索并发现了类似的问题:713 and 718。在带有 hyperkit vm-driver 的 minikube 集群上进行了尝试。在 minikube 上,节点导出器开箱即用,但抓取目标问题仍然存在。不确定什么是安全的解决方案?
我可能会调查 prometheus 和 grafana 的替代 helm chart 依赖项...
这个问题最近已经解决了。这里有更多信息:https://github.com/prometheus-community/helm-charts/issues/467 and here: https://github.com/prometheus-community/helm-charts/pull/757
这是解决方案 (https://github.com/prometheus-community/helm-charts/issues/467#issuecomment-802642666):
[you need to] opt-out the rootfs host mount (preventing the crash). In order to do that you need to specify the following value in values.yaml file:
prometheus-node-exporter: hostRootFsMount: false