在 k8s 集群中监控 pod 是否运行的最简单方法是什么

Question

我的场景是这样的：

我有一个k8s集群运行
在这个 k8s 集群中我定义了一个 statefullset，在 statefullset 中我有一个 pod 运行，假设它是 podName-0

我想实现的是，每当 podName-0 不处于运行状态时，向某人发送电子邮件，然后有人会解决此问题。

我尝试使用 Prometheus 来制作它，但它看起来有点重（例如 ClusterRole/ClusterRoleBinding/等）

有什么简单的方法可以做到这一点吗？谢谢！

Answer 1

要检测您的 podName-0 从运行意外终止更改，您可以使用 prestop hook 进行标注。如果你需要超过30s（默认），你可以将terminationGracePeriodSeconds设置为更长的时间。

...
spec:
  ...
  template:
  ...
    spec:
      containers:
      - name: busybox
        ...
        lifecycle:
          preStop:
            exec:
              command: ["<callout>"]
      ...
      terminationGracePeriodSeconds: 60

这可能是最简单的方法。既然提到了Prometheus，你可以checkout Alert Manager rules available 进行综合检查和触发。除了标准的 Prometheus 安装要求外，此方法不需要任何特殊的 RBAC。

Answer 2

您的问题是基于意见的，无法明确回答。我将尝试为您列出几种解决问题的方法，但我不能说哪种方法“更容易”。一切都有其优点和缺点。但说到点子上。先看this question:

Coderanger 写道：

The somewhat convoluted standard answer to this is Kubernetes -> kube-state-metrics -> Prometheus -> alertmanager -> webhook. This might sound like a lot for a simple task, but Prometheus and its related tools are used much more broadly for metrics and alerting. If you wanted a more narrow answer, you could check out Brigade perhaps? But probably just use kube-prometheus (which is Prom with a bunch of related components all setup for you).

这很好地解释了为什么您可以使用 Prometheus 以及与之相关的其他优势。

更进一步。 Patrick W 提及：

You can add a preStop hook to your pod spec. The hook can either run a script or make an HTTP call before the pod shuts down. You can configure the hook to call an API which triggers a notification.

在这个问题的第二个回答中提出了类似的解决方案。如果您决定使用 preStop 钩子，请阅读 this doc.

另一种方法是使用外部工具，例如 Atomist. On the blog you can find this article 关于 Kubernetes 健康警报。

另请参阅：

在 k8s 集群中监控 pod 是否运行的最简单方法是什么

what's the simplest way of monitoring whether a pod is running in k8s cluster

monitoring

kubernetes

在 k8s 集群中监控 pod 是否 运行 的最简单方法是什么

what's the simplest way of monitoring whether a pod is running in k8s cluster

monitoring

kubernetes

在 k8s 集群中监控 pod 是否运行的最简单方法是什么