Kubernetes Prometheus CrashLoopBackOff / OOMKilled 谜题
Kubernetes Prometheus CrashLoopBackOff / OOMKilled Puzzle
我会定期查看容器
状态:已终止 - OOMKilled(退出代码:137)
但是调度到内存充足的节点
$ k get statefulset -n metrics
NAME READY AGE
prometheus 0/1 232d
$ k get po -n metrics
prometheus-0 1/2 CrashLoopBackOff 147 12h
$ k get events -n metrics
LAST SEEN TYPE REASON OBJECT MESSAGE
10m Normal Pulled pod/prometheus-0 Container image "prom/prometheus:v2.11.1" already present on machine
51s Warning BackOff pod/prometheus-0 Back-off restarting failed container
k logs -f prometheus-0 -n metrics --all-containers=true
level=warn ts=2020-08-22T20:48:02.302Z caller=main.go:282 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:330 build_context="(go=go1.12.7, user=root@d94406f2bb6f, date=20190710-13:51:17)"
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:331 host_details="(Linux 4.14.186-146.268.amzn2.x86_64 #1 SMP Tue Jul 14 18:16:52 UTC 2020 x86_64 prometheus-0 (none))"
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-08-22T20:48:02.303Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-08-22T20:48:02.307Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2020-08-22T20:48:02.307Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-08-22T20:48:02.311Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597968000000 maxt=1597975200000 ulid=01EG7FAW5PE9ARVHJNKW1SJXRK
level=info ts=2020-08-22T20:48:02.312Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597975200000 maxt=1597982400000 ulid=01EG7P6KDPXPFVPSMBXBDF48FQ
level=info ts=2020-08-22T20:48:02.313Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597982400000 maxt=1597989600000 ulid=01EG7X2ANPN30M8ET2S8EPGKEA
level=info ts=2020-08-22T20:48:02.314Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597989600000 maxt=1597996800000 ulid=01EG83Y1XPXRWRRR2VQRNFB37F
level=info ts=2020-08-22T20:48:02.314Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597996800000 maxt=1598004000000 ulid=01EG8ASS5P9J1TBZW2P4B2GV7P
level=info ts=2020-08-22T20:48:02.315Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598004000000 maxt=1598011200000 ulid=01EG8HNGDXMYRH0CGWNHKECCPR
level=info ts=2020-08-22T20:48:02.316Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598011200000 maxt=1598018400000 ulid=01EG8RH7NPHSC5PAGXCMN8K9HE
level=info ts=2020-08-22T20:48:02.317Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598018400000 maxt=1598025600000 ulid=01EG8ZCYXNABK8FD3ZGFSQ9NGQ
level=info ts=2020-08-22T20:48:02.317Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598025600000 maxt=1598032800000 ulid=01EG968P5T7SJTVDCZGN6D5YW2
level=info ts=2020-08-22T20:48:02.317Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598032800000 maxt=1598040000000 ulid=01EG9D4DDPR9SE62C0XNE0Z64C
level=info ts=2020-08-22T20:48:02.318Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598040000000 maxt=1598047200000 ulid=01EG9M04NYMAFACVCMDD2RF11W
level=info ts=2020-08-22T20:48:02.319Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598047200000 maxt=1598054400000 ulid=01EG9TVVXNJ7VCDXQNNK2BTZAE
level=info ts=2020-08-22T20:48:02.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598054400000 maxt=1598061600000 ulid=01EGA1QK5PHHZ6P6TNPHDWSD81
k describe statefulset prometheus -n metrics
Name: prometheus
Namespace: metrics
CreationTimestamp: Fri, 03 Jan 2020 04:33:58 -0800
Selector: app=prometheus
Labels: <none>
Annotations: <none>
Replicas: 1 desired | 1 total
Update Strategy: RollingUpdate
Partition: 824644121032
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=prometheus
Annotations: checksum/config: 6982e2d83da89ab6fa57e1c2c8a217bb5c1f5abe13052a171cd8d5e238a40646
Service Account: prometheus
Containers:
prometheus-configmap-reloader:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/prometheus
--webhook-url=http://localhost:9090/-/reload
Environment: <none>
Mounts:
/etc/prometheus from prometheus (ro)
prometheus:
Image: prom/prometheus:v2.11.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/prometheus/prometheus.yml
--web.enable-lifecycle
--web.enable-admin-api
--storage.tsdb.path=/prometheus/data
--storage.tsdb.retention=1d
Limits:
memory: 1Gi
Liveness: http-get http://:9090/-/healthy delay=180s timeout=1s period=120s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/prometheus from prometheus (rw)
/etc/prometheus-alert-rules from prometheus-alert-rules (rw)
/prometheus/data from prometheus-data-storage (rw)
Volumes:
prometheus:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus
Optional: false
prometheus-alert-rules:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-alert-rules
Optional: false
Volume Claims:
Name: prometheus-data-storage
StorageClass: prometheus
Labels: <none>
Annotations: <none>
Capacity: 20Gi
Access Modes: [ReadWriteOnce]
Events: <none>
可能是什么原因?
Periodically I see the container Status: terminated - OOMKilled (exit code: 137)
But it's scheduled to the node with plenty of memory
正如您可能已经看到的,很明显您使用的内存超过了配置的 1GB。答案可能在于您如何使用 Prometheus 以及您达到 1GB 的使用限制。您可以查看的一些内容:
- 时间序列数
- 每个时间序列的平均标签
- 唯一标签对的数量
- 抓取间隔(秒)
- 每个样本的字节数
以上here的用法可以找一个内存计算器。
✌️
对于 Kubernetes 监控而言,Prometheus pod 的 1Gi 内存限制非常低,其中从数千个目标(pods、节点、端点等)中收集了数百万个指标。
建议提高 Prometheus pod 的内存限制,直到它不再因 out of memory
错误而崩溃。
建议为 Prometheus 本身设置监控 - 它在 http://prometheus-host:9090/metrics
url 导出自己的指标 - 例如,参见 http://demo.robustperception.io:9090/metrics。
可以通过以下方式减少 Prometheus 内存使用量:
- 在 Prometheus 配置中增加
scrape_interval
,以降低抓取目标的频率。这减少了 Prometheus 的内存使用量,因为 Prometheus 将最近抓取的指标存储在内存中长达 2 小时。
- 通过
relabel_configs
过滤掉不需要的抓取目标。参见 https://www.robustperception.io/life-of-a-label。
- 通过
metric_relabel_configs
过滤掉不需要的指标。参见 https://www.robustperception.io/life-of-a-label。
P.S。还有其他 Prometheus-like 解决方案,可以在抓取同一组目标时使用较少的内存。例如,参见 vmagent and VictoriaMetrics.
我会定期查看容器 状态:已终止 - OOMKilled(退出代码:137)
但是调度到内存充足的节点
$ k get statefulset -n metrics
NAME READY AGE
prometheus 0/1 232d
$ k get po -n metrics
prometheus-0 1/2 CrashLoopBackOff 147 12h
$ k get events -n metrics
LAST SEEN TYPE REASON OBJECT MESSAGE
10m Normal Pulled pod/prometheus-0 Container image "prom/prometheus:v2.11.1" already present on machine
51s Warning BackOff pod/prometheus-0 Back-off restarting failed container
k logs -f prometheus-0 -n metrics --all-containers=true
level=warn ts=2020-08-22T20:48:02.302Z caller=main.go:282 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:330 build_context="(go=go1.12.7, user=root@d94406f2bb6f, date=20190710-13:51:17)"
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:331 host_details="(Linux 4.14.186-146.268.amzn2.x86_64 #1 SMP Tue Jul 14 18:16:52 UTC 2020 x86_64 prometheus-0 (none))"
level=info ts=2020-08-22T20:48:02.302Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-08-22T20:48:02.303Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-08-22T20:48:02.307Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2020-08-22T20:48:02.307Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-08-22T20:48:02.311Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597968000000 maxt=1597975200000 ulid=01EG7FAW5PE9ARVHJNKW1SJXRK
level=info ts=2020-08-22T20:48:02.312Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597975200000 maxt=1597982400000 ulid=01EG7P6KDPXPFVPSMBXBDF48FQ
level=info ts=2020-08-22T20:48:02.313Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597982400000 maxt=1597989600000 ulid=01EG7X2ANPN30M8ET2S8EPGKEA
level=info ts=2020-08-22T20:48:02.314Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597989600000 maxt=1597996800000 ulid=01EG83Y1XPXRWRRR2VQRNFB37F
level=info ts=2020-08-22T20:48:02.314Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1597996800000 maxt=1598004000000 ulid=01EG8ASS5P9J1TBZW2P4B2GV7P
level=info ts=2020-08-22T20:48:02.315Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598004000000 maxt=1598011200000 ulid=01EG8HNGDXMYRH0CGWNHKECCPR
level=info ts=2020-08-22T20:48:02.316Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598011200000 maxt=1598018400000 ulid=01EG8RH7NPHSC5PAGXCMN8K9HE
level=info ts=2020-08-22T20:48:02.317Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598018400000 maxt=1598025600000 ulid=01EG8ZCYXNABK8FD3ZGFSQ9NGQ
level=info ts=2020-08-22T20:48:02.317Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598025600000 maxt=1598032800000 ulid=01EG968P5T7SJTVDCZGN6D5YW2
level=info ts=2020-08-22T20:48:02.317Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598032800000 maxt=1598040000000 ulid=01EG9D4DDPR9SE62C0XNE0Z64C
level=info ts=2020-08-22T20:48:02.318Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598040000000 maxt=1598047200000 ulid=01EG9M04NYMAFACVCMDD2RF11W
level=info ts=2020-08-22T20:48:02.319Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598047200000 maxt=1598054400000 ulid=01EG9TVVXNJ7VCDXQNNK2BTZAE
level=info ts=2020-08-22T20:48:02.320Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1598054400000 maxt=1598061600000 ulid=01EGA1QK5PHHZ6P6TNPHDWSD81
k describe statefulset prometheus -n metrics
Name: prometheus
Namespace: metrics
CreationTimestamp: Fri, 03 Jan 2020 04:33:58 -0800
Selector: app=prometheus
Labels: <none>
Annotations: <none>
Replicas: 1 desired | 1 total
Update Strategy: RollingUpdate
Partition: 824644121032
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=prometheus
Annotations: checksum/config: 6982e2d83da89ab6fa57e1c2c8a217bb5c1f5abe13052a171cd8d5e238a40646
Service Account: prometheus
Containers:
prometheus-configmap-reloader:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/prometheus
--webhook-url=http://localhost:9090/-/reload
Environment: <none>
Mounts:
/etc/prometheus from prometheus (ro)
prometheus:
Image: prom/prometheus:v2.11.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/prometheus/prometheus.yml
--web.enable-lifecycle
--web.enable-admin-api
--storage.tsdb.path=/prometheus/data
--storage.tsdb.retention=1d
Limits:
memory: 1Gi
Liveness: http-get http://:9090/-/healthy delay=180s timeout=1s period=120s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/prometheus from prometheus (rw)
/etc/prometheus-alert-rules from prometheus-alert-rules (rw)
/prometheus/data from prometheus-data-storage (rw)
Volumes:
prometheus:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus
Optional: false
prometheus-alert-rules:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-alert-rules
Optional: false
Volume Claims:
Name: prometheus-data-storage
StorageClass: prometheus
Labels: <none>
Annotations: <none>
Capacity: 20Gi
Access Modes: [ReadWriteOnce]
Events: <none>
可能是什么原因?
Periodically I see the container Status: terminated - OOMKilled (exit code: 137)
But it's scheduled to the node with plenty of memory
正如您可能已经看到的,很明显您使用的内存超过了配置的 1GB。答案可能在于您如何使用 Prometheus 以及您达到 1GB 的使用限制。您可以查看的一些内容:
- 时间序列数
- 每个时间序列的平均标签
- 唯一标签对的数量
- 抓取间隔(秒)
- 每个样本的字节数
以上here的用法可以找一个内存计算器。
✌️
对于 Kubernetes 监控而言,Prometheus pod 的 1Gi 内存限制非常低,其中从数千个目标(pods、节点、端点等)中收集了数百万个指标。
建议提高 Prometheus pod 的内存限制,直到它不再因 out of memory
错误而崩溃。
建议为 Prometheus 本身设置监控 - 它在 http://prometheus-host:9090/metrics
url 导出自己的指标 - 例如,参见 http://demo.robustperception.io:9090/metrics。
可以通过以下方式减少 Prometheus 内存使用量:
- 在 Prometheus 配置中增加
scrape_interval
,以降低抓取目标的频率。这减少了 Prometheus 的内存使用量,因为 Prometheus 将最近抓取的指标存储在内存中长达 2 小时。 - 通过
relabel_configs
过滤掉不需要的抓取目标。参见 https://www.robustperception.io/life-of-a-label。 - 通过
metric_relabel_configs
过滤掉不需要的指标。参见 https://www.robustperception.io/life-of-a-label。
P.S。还有其他 Prometheus-like 解决方案,可以在抓取同一组目标时使用较少的内存。例如,参见 vmagent and VictoriaMetrics.