使用 Prometheus 测量 K8s 服务端点上的 40x 和 50x 错误?

Measuring 40x and 50x errors on a K8s service endpoint with Prometheus?

不知道如何解决这个问题,在 google 上找不到太多明确的信息来测量我的服务端点上的错误(40x 和 50x)。我的服务启动了,当我删除 pods 只是为了测试时,我可以在黑盒指标中看到普罗米修斯得到的错误,但没有像 40x 类型或 50x 那样指定。

编辑 1:

Prometheus 部署为 helm 堆栈,与 Blackbox 监视器相同。一切都部署在默认命名空间上,因为现阶段只是为了测试如何实现这个目标。

基于this topic:

Services in Kubernetes are kind of like load-balancers - they just route requests to underlying pods. The pods themselves actually contain the application that does the work and returns the status code. You don't monitor kubernetes services per-se for 4xx or 5xx errors, you need to monitor the underlying application itself.

因此,您需要创建一个架构来监控您的应用程序。 Prometheus 只收集指标并从中制作图表,它自己不处理任何东西。指标必须由应用程序公开。 Here you can find topic - Kubernetes monitoring with Prometheus, the ultimate guide. Is very comprehensive and explains perfectly how to monitor an application. For you, the most interesting part should be How to monitor a Kubernetes service with Prometheus. You can also find there a Prometheus Operator Tutorial。它可以帮助您实现 Prometheus、Alertmanager 和 Grafana 的自动化部署。

安装完所有内容后,您就可以收集指标了。使用 lables 是一种很好的做法。这使您可以轻松区分应用程序的不同响应代码。

For example, rather than http_responses_500_total and http_responses_403_total, create a single metric called http_responses_total with a code label for the HTTP response code. You can then process the entire metric as one in rules and graphs.