使用 Prometheus 测量 K8s 服务端点上的 40x 和 50x 错误？

Measuring 40x and 50x errors on a K8s service endpoint with Prometheus?

不知道如何解决这个问题，在 google 上找不到太多明确的信息来测量我的服务端点上的错误（40x 和 50x）。我的服务启动了，当我删除 pods 只是为了测试时，我可以在黑盒指标中看到普罗米修斯得到的错误，但没有像 40x 类型或 50x 那样指定。

编辑 1：

是的，我已经设置了我的集群，在这个阶段是实验性的，我已经在 VirtualBox+Vagrant+K3s 上设置了它。我创建了两个简单的服务，一个是前端，一个是后端，并配置了 prometheus Jobs 来发现服务并通过 Blackbox 监视器探测它们的正常运行时间。我的目标是以某种方式在 grafana 仪表板上获取一些指标，以衡量一段时间内对这些服务的所有请求的 40 倍或 50 倍错误的数量。目前我的想法是测量 2xx 的数量并仅报告非 2xx 状态代码，但这将包括比 40x 和 50x 更多的 errors/status。

Prometheus 部署为 helm 堆栈，与 Blackbox 监视器相同。一切都部署在默认命名空间上，因为现阶段只是为了测试如何实现这个目标。

基于this topic:

Services in Kubernetes are kind of like load-balancers - they just route requests to underlying pods. The pods themselves actually contain the application that does the work and returns the status code. You don't monitor kubernetes services per-se for 4xx or 5xx errors, you need to monitor the underlying application itself.

因此，您需要创建一个架构来监控您的应用程序。 Prometheus 只收集指标并从中制作图表，它自己不处理任何东西。指标必须由应用程序公开。 Here you can find topic - Kubernetes monitoring with Prometheus, the ultimate guide. Is very comprehensive and explains perfectly how to monitor an application. For you, the most interesting part should be How to monitor a Kubernetes service with Prometheus. You can also find there a Prometheus Operator Tutorial。它可以帮助您实现 Prometheus、Alertmanager 和 Grafana 的自动化部署。

安装完所有内容后，您就可以收集指标了。使用 lables 是一种很好的做法。这使您可以轻松区分应用程序的不同响应代码。

For example, rather than http_responses_500_total and http_responses_403_total, create a single metric called http_responses_total with a code label for the HTTP response code. You can then process the entire metric as one in rules and graphs.

使用 Prometheus 测量 K8s 服务端点上的 40x 和 50x 错误？

Measuring 40x and 50x errors on a K8s service endpoint with Prometheus?

monitoring

grafana

kubernetes

devops

prometheus