普罗米修斯:找到最大 RPS

Prometheus: find max RPS

假设我在 Prometheus 中有两个指标,两个计数器:

好的:

nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status="200"}

失败:

nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}

总计:

nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}

我的问题是如何找到 RPS 失败作为 promQL 查询

我期待以下回复:

400

意思是,如果 pod 收到 > 400 RPS,Failure 指标开始发生


完整查询(得到回答后)

sum((sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}[$__rate_interval])) without (status))
  and
  (sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status !="200"}[$__rate_interval])) without (status) > 0))

您需要以下查询:

rps_total and (rps_failure > 0)

and binary operation is used for matching right-hand time series to the left-hand series with the same set of labels. See these docs匹配规则详情

让我们将rps_totalrps_failure替换为给定上述匹配规则的实际时间序列。

  • rps_total 替换为 sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status) 。需要 sum(...) without (status) 才能对按剩余标签分组的所有 status 标签的指标求和。

  • rps_failure替换为sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status)

那么最终的 PromQL 查询将如下所示:

(
  sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
  and
  (sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status) > 0)
)