无法将 K8s 服务添加为普罗米修斯目标
Unable to add a K8s service as prometheus target
我希望我的 prometheus 服务器从 pod 中抓取指标。
我遵循了这些步骤:
- 使用部署创建了一个 pod -
kubectl apply -f sample-app.deploy.yaml
- 使用
kubectl apply -f sample-app.service.yaml
曝光相同
- 使用
helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
部署了 Prometheus 服务器
- 使用
kubectl apply -f service-monitor.yaml
创建了一个 serviceMonitor 以添加 prometheus 的目标。
所有 pods 都是 运行,但是当我打开 prometheus 仪表板时,我没有看到 sample-app service 作为普罗米修斯目标,在仪表板中的状态>目标下UI。
我已验证以下内容:
- 执行
kubectl get servicemonitors
可以看到sample-app
- 我可以在
/metrics
下看到示例应用以普罗米修斯格式公开指标
此时我进一步调试,进入prometheus pod使用
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
,并看到 proemetheus 配置 (/etc/config/prometheus.yml) 没有将示例应用程序作为作业之一,因此我使用
编辑了配置映射
kubectl edit cm prometheus-server -o yaml
已添加
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
假设所有其他字段,例如 scraping 间隔,scrape_timeout 保持默认。
我可以看到 /etc/config/prometheus.yml 中也反映了同样的情况,但 prometheus 仪表板仍然没有将 sample-app
显示为 status>targets.
下的目标
以下是 prometheus-server 和服务监视器的 yaml。
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
用于服务监视器的 yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus
您需要使用包含 Prometheus 运算符的 prometheus-community/kube-prometheus-stack
图表,以便 Prometheus 的配置根据 ServiceMonitor 资源自动更新。
您使用的 prometheus-community/prometheus
图表不包括监视 Kubernetes API 中的 ServiceMonitor 资源并相应地更新 Prometheus 服务器的 ConfigMap 的 Prometheus 运算符。
您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在 prometheus-community/prometheus
图表中,因此可能之前已将它们添加到您的集群中。
我希望我的 prometheus 服务器从 pod 中抓取指标。
我遵循了这些步骤:
- 使用部署创建了一个 pod -
kubectl apply -f sample-app.deploy.yaml
- 使用
kubectl apply -f sample-app.service.yaml
曝光相同
- 使用
helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
部署了 Prometheus 服务器
- 使用
kubectl apply -f service-monitor.yaml
创建了一个 serviceMonitor 以添加 prometheus 的目标。
所有 pods 都是 运行,但是当我打开 prometheus 仪表板时,我没有看到 sample-app service 作为普罗米修斯目标,在仪表板中的状态>目标下UI。
我已验证以下内容:
- 执行
kubectl get servicemonitors
可以看到 - 我可以在
/metrics
下看到示例应用以普罗米修斯格式公开指标
sample-app
此时我进一步调试,进入prometheus pod使用
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
,并看到 proemetheus 配置 (/etc/config/prometheus.yml) 没有将示例应用程序作为作业之一,因此我使用
kubectl edit cm prometheus-server -o yaml
已添加
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
假设所有其他字段,例如 scraping 间隔,scrape_timeout 保持默认。
我可以看到 /etc/config/prometheus.yml 中也反映了同样的情况,但 prometheus 仪表板仍然没有将 sample-app
显示为 status>targets.
以下是 prometheus-server 和服务监视器的 yaml。
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
用于服务监视器的 yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus
您需要使用包含 Prometheus 运算符的 prometheus-community/kube-prometheus-stack
图表,以便 Prometheus 的配置根据 ServiceMonitor 资源自动更新。
您使用的 prometheus-community/prometheus
图表不包括监视 Kubernetes API 中的 ServiceMonitor 资源并相应地更新 Prometheus 服务器的 ConfigMap 的 Prometheus 运算符。
您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在 prometheus-community/prometheus
图表中,因此可能之前已将它们添加到您的集群中。