Istio Prometheus pod 处于 CrashLoopBackOff 状态

Istio Prometheus pod in CrashLoopBackOff State

我正在尝试为 website 上提供的 bookinfo 示例设置 Istio (1.5.4)。我使用了演示配置文件。但是在验证 istio 安装时它失败了,因为 Prometheus pod 进入了 CrashLoopBackOff 状态。

   NAME                                   READY   STATUS             RESTARTS   AGE
grafana-5f6f8cbf75-psk78               1/1     Running            0          21m
istio-egressgateway-7f9f45c966-g7k9j   1/1     Running            0          21m
istio-ingressgateway-968d69c8b-bhxk5   1/1     Running            0          21m
istio-tracing-9dd6c4f7c-7fm79          1/1     Running            0          21m
istiod-86884c8c45-sw96x                1/1     Running            0          21m
kiali-869c6894c5-wqgjb                 1/1     Running            0          21m
prometheus-589c44dbfc-xkwmj            1/2     CrashLoopBackOff   8          21m

prometheus pod 的日志:

level=warn ts=2020-05-15T09:07:53.113Z caller=main.go:283 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.1, branch=HEAD, revision=8744510c6391d3ef46d8294a7e1f46e57407ab13)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:331 build_context="(go=go1.13.5, user=root@4b1e33c71b9d, date=20191225-01:04:15)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:332 host_details="(Linux 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 prometheus-589c44dbfc-xkwmj (none))"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:107 component=activeQueryTracker msg="Failed to create directory for logging active queries"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: no such file or directory"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x24dda5b, 0x5, 0x14, 0x2c62100, 0xc0005f63c0, 0x2c62100)
    /app/promql/query_logger.go:115 +0x48c
main.main()
    /app/cmd/prometheus/main.go:362 +0x5229

描述 pod 输出:

Name:         prometheus-589c44dbfc-xkwmj
Namespace:    istio-system
Priority:     0
Node:         inspiron-7577/192.168.0.9
Start Time:   Fri, 15 May 2020 14:21:14 +0530
Labels:       app=prometheus
              pod-template-hash=589c44dbfc
              release=istio
Annotations:  sidecar.istio.io/inject: false
Status:       Running
IP:           172.17.0.11
IPs:
  IP:           172.17.0.11
Controlled By:  ReplicaSet/prometheus-589c44dbfc
Containers:
  prometheus:
    Container ID:  docker://b6820a000ab67a5ce31d3a38f6f0d510bd150794b2792147fc17ef8f730c03bb
    Image:         docker.io/prom/prometheus:v2.15.1
    Image ID:      docker-pullable://prom/prometheus@sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --storage.tsdb.retention=6h
      --config.file=/etc/prometheus/prometheus.yml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 15 May 2020 14:37:50 +0530
      Finished:     Fri, 15 May 2020 14:37:53 +0530
    Ready:          False
    Restart Count:  8
    Requests:
      cpu:        10m
    Liveness:     http-get http://:9090/-/healthy delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9090/-/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/istio-certs from istio-certs (rw)
      /etc/prometheus from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
  istio-proxy:
    Container ID:  docker://fa756c93510b6f402d7d88c31a5f5f066d4c254590eab70886e7835e7d3871be
    Image:         docker.io/istio/proxyv2:1.5.4
    Image ID:      docker-pullable://istio/proxyv2@sha256:e16e2801b7fd93154e8fcb5f4e2fb1240d73349d425b8be90691d48e8b9bb944
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --configPath
      /etc/istio/proxy
      --binaryPath
      /usr/local/bin/envoy
      --serviceCluster
      istio-proxy-prometheus
      --drainDuration
      45s
      --parentShutdownDuration
      1m0s
      --discoveryAddress
      istio-pilot.istio-system.svc:15012
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --connectTimeout
      10s
      --proxyAdminPort
      15000
      --controlPlaneAuthPolicy
      NONE
      --dnsRefreshRate
      300s
      --statusPort
      15020
      --trust-domain=cluster.local
      --controlPlaneBootstrap=false
    State:          Running
      Started:      Fri, 15 May 2020 14:21:31 +0530
    Ready:          True
    Restart Count:  0
    Readiness:      http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
    Environment:
      OUTPUT_CERTS:                 /etc/istio-certs
      JWT_POLICY:                   first-party-jwt
      PILOT_CERT_PROVIDER:          istiod
      CA_ADDR:                      istio-pilot.istio-system.svc:15012
      POD_NAME:                     prometheus-589c44dbfc-xkwmj (v1:metadata.name)
      POD_NAMESPACE:                istio-system (v1:metadata.namespace)
      INSTANCE_IP:                   (v1:status.podIP)
      SERVICE_ACCOUNT:               (v1:spec.serviceAccountName)
      HOST_IP:                       (v1:status.hostIP)
      ISTIO_META_POD_NAME:          prometheus-589c44dbfc-xkwmj (v1:metadata.name)
      ISTIO_META_CONFIG_NAMESPACE:  istio-system (v1:metadata.namespace)
      ISTIO_META_MESH_ID:           cluster.local
      ISTIO_META_CLUSTER_ID:        Kubernetes
    Mounts:
      /etc/istio-certs/ from istio-certs (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus
    Optional:  false
  istio-certs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istiod-ca-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-ca-root-cert
    Optional:  false
  prometheus-token-cgqbc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-token-cgqbc
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                   From                    Message
  ----     ------       ----                  ----                    -------
  Normal   Scheduled    <unknown>             default-scheduler       Successfully assigned istio-system/prometheus-589c44dbfc-xkwmj to inspiron-7577
  Warning  FailedMount  17m                   kubelet, inspiron-7577  MountVolume.SetUp failed for volume "prometheus-token-cgqbc" : failed to sync secret cache: timed out waiting for the condition
  Warning  FailedMount  17m                   kubelet, inspiron-7577  MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulled       17m                   kubelet, inspiron-7577  Container image "docker.io/istio/proxyv2:1.5.4" already present on machine
  Normal   Created      17m                   kubelet, inspiron-7577  Created container istio-proxy
  Normal   Started      17m                   kubelet, inspiron-7577  Started container istio-proxy
  Warning  Unhealthy    17m                   kubelet, inspiron-7577  Readiness probe failed: HTTP probe failed with statuscode: 503
  Normal   Pulled       16m (x4 over 17m)     kubelet, inspiron-7577  Container image "docker.io/prom/prometheus:v2.15.1" already present on machine
  Normal   Created      16m (x4 over 17m)     kubelet, inspiron-7577  Created container prometheus
  Normal   Started      16m (x4 over 17m)     kubelet, inspiron-7577  Started container prometheus
  Warning  BackOff      2m24s (x72 over 17m)  kubelet, inspiron-7577  Back-off restarting failed container

无法创建日志目录。请提供任何想法。

由于 istio 1.5.4 刚刚发布,使用 istioctl manifest apply 安装的 minikube 上的 prometheus 存在一些问题。

我在 gcp 上检查过,一切正常。


作为一种变通方法,您可以使用 istio operator,我和 OP 对此进行了测试,正如他在评论中提到的那样,它正在运行。

Thanks a lot @jt97! It did work.


安装 istio operator 的步骤

To install the Istio demo configuration profile using the operator, run the following command:

kubectl create ns istio-system
kubectl apply -f - <<EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: example-istiocontrolplane
spec:
  profile: demo
EOF  

Could you tell me why the normal installation failed?

正如我在评论中提到的,我还不知道。如果我找到原因,我会更新这个问题。