普罗米修斯：找到最大 RPS

Question

假设我在 Prometheus 中有两个指标，两个计数器：

好的：

nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status="200"}

失败：

nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}

总计：

nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}

我的问题是如何找到 RPS 失败作为 promQL 查询

我期待以下回复：

意思是，如果 pod 收到 > 400 RPS，Failure 指标开始发生

完整查询（得到回答后）

sum((sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}[$__rate_interval])) without (status))
  and
  (sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status !="200"}[$__rate_interval])) without (status) > 0))

Answer 1

您需要以下查询：

rps_total and (rps_failure > 0)

and binary operation is used for matching right-hand time series to the left-hand series with the same set of labels. See these docs匹配规则详情

让我们将rps_total和rps_failure替换为给定上述匹配规则的实际时间序列。

rps_total 替换为 sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status) 。需要 sum(...) without (status) 才能对按剩余标签分组的所有 status 标签的指标求和。
将rps_failure替换为sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status)

那么最终的 PromQL 查询将如下所示：

(
  sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
  and
  (sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status) > 0)
)

普罗米修斯：找到最大 RPS

Prometheus: find max RPS

monitoring

grafana

prometheus