普罗米修斯:找到最大 RPS
Prometheus: find max RPS
假设我在 Prometheus 中有两个指标,两个计数器:
好的:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status="200"}
失败:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}
总计:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}
我的问题是如何找到 RPS
失败作为 promQL
查询
我期待以下回复:
400
意思是,如果 pod 收到 > 400 RPS,Failure
指标开始发生
完整查询(得到回答后)
sum((sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}[$__rate_interval])) without (status))
and
(sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status !="200"}[$__rate_interval])) without (status) > 0))
您需要以下查询:
rps_total and (rps_failure > 0)
and
binary operation is used for matching right-hand time series to the left-hand series with the same set of labels. See these docs匹配规则详情
让我们将rps_total
和rps_failure
替换为给定上述匹配规则的实际时间序列。
rps_total
替换为 sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
。需要 sum(...) without (status)
才能对按剩余标签分组的所有 status
标签的指标求和。
将rps_failure
替换为sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status)
那么最终的 PromQL 查询将如下所示:
(
sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
and
(sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status) > 0)
)
假设我在 Prometheus 中有两个指标,两个计数器:
好的:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status="200"}
失败:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}
总计:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}
我的问题是如何找到 RPS
失败作为 promQL
查询
我期待以下回复:
400
意思是,如果 pod 收到 > 400 RPS,Failure
指标开始发生
完整查询(得到回答后)
sum((sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}[$__rate_interval])) without (status))
and
(sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status !="200"}[$__rate_interval])) without (status) > 0))
您需要以下查询:
rps_total and (rps_failure > 0)
and
binary operation is used for matching right-hand time series to the left-hand series with the same set of labels. See these docs匹配规则详情
让我们将rps_total
和rps_failure
替换为给定上述匹配规则的实际时间序列。
rps_total
替换为sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
。需要sum(...) without (status)
才能对按剩余标签分组的所有status
标签的指标求和。将
rps_failure
替换为sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status)
那么最终的 PromQL 查询将如下所示:
(
sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
and
(sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status) > 0)
)