给定时间段内所有事件的数值总和

Question

我定期（每分钟）记录以下事件：

14:58 index=prod_service service.error error.count="3"
14:59 index=prod_service service.error error.count="4"
15:00 index=prod_service service.error error.count="0"
15:01 index=prod_service service.error error.count="10"

我已经设置了一个提醒，当我们在一个小时内有 10 个事件超过“0”时提醒我 error.counts，但是我想将它更改为在计数结束时提醒我一小时内所有事件都大于 10。那么我如何对所有事件求和 error.count（即 17）

我当前的查询只计算错误数大于 0 的事件数...:[=12=]

index=prod-service service.count | where sum('error.count') > 0

Answer 1

使用 stats 命令将所有计数相加，然后使用 where 进行过滤。

index=prod-service service.count earliest=-60m
| stats sum('error.count') as total_errors
| where total_errors > 10

Answer 2

这对我有用：

index=prod-service "service.error" | timechart sum(error.count) AS "Count" | stats sum("Count") as "Total"

然后在警报设置中我必须选择 custom 作为触发条件而不是 Number of Results 并输入：

search Total > 10

给定时间段内所有事件的数值总和

Sum of numeric values in all events in given time period

splunk

splunk-query