为什么在普罗米修斯的节点导出器中度量值的仪表类型不断增加?

Why does gauge type of metric value keep increasing in node-exporter of prometheus?

这是节点导出器指标的片段。

# HELP node_network_receive_bytes Network device statistic receive_bytes.
# TYPE node_network_receive_bytes gauge
node_network_receive_bytes{device="br-074eb8733fdc"} 4.5000969e+07
node_network_receive_bytes{device="br-d24ce0793158"} 9.8563483e+07
node_network_receive_bytes{device="docker0"} 1.81893686701e+11
node_network_receive_bytes{device="eno1"} 1.30390371207e+11
node_network_receive_bytes{device="eno2"} 2.7347435325e+10
node_network_receive_bytes{device="lo"} 9.80764398145e+11
node_network_receive_bytes{device="veth9eee40a"} 9.5458576e+07
node_network_receive_bytes{device="vethb89d9df"} 1.2443436876e+11
node_network_receive_bytes{device="vethd5ca4a4"} 648

表示node_network_receive_bytes的类型是gauge (坦率地说,我不确定这是否是检查指标类型的正确方法,但直觉上看起来是正确的)。

然而,当我用range-vector检查node_network_receive_bytes时,它显示数字不断增加,例如counter类型。

node_network_receive_bytes{device="eno1"}[3m]

    130393948462 @1516931391.405
    130394168285 @1516931401.405
    130394376002 @1516931411.405
    130394579742 @1516931421.405
    130394755152 @1516931431.405
    130394955813 @1516931441.405
    130395174828 @1516931451.405
    130395475287 @1516931461.405
    130395734293 @1516931471.405
    130395935167 @1516931481.405
    130396110667 @1516931491.405
    130396314762 @1516931501.405
    130396490334 @1516931511.405
    130396675817 @1516931521.405
    130396825764 @1516931531.405
    130397011068 @1516931541.405
    130397158242 @1516931551.405
    130397367815 @1516931561.405

此外,我为 node-exporter 下载的 Grafana 仪表板使用 irateincrease 查询此指标,后者适用于 counter 类型指标,因为我知道。

// Query in grafana dashboard for node-exporter
sum(irate(node_network_receive_bytes{device=~"$device",instance=~"$node"}[3m]))
sum(increase(node_network_receive_bytes{device=~"$device",instance=~"$node"}[1m]))

increase()

increase should only be used with counters. It is syntactic sugar for rate(v) multiplied by the number of seconds under the specified time range window, and should be used primarily for human readability. Use rate in recording rules so that increases are tracked consistently on a per-second basis.

irate()

irate should only be used when graphing volatile, fast-moving counters. Use rate for alerts and slow-moving counters, as brief changes in the rate can reset the FOR clause and graphs consisting entirely of rare spikes are hard to read.

我一直很迷茫,我错过了什么? (node_network_transmit_bytes也出现同样的症状。)

这些实际上是计数器,在版本 0.16.0 中将具有正确的类型和指标名称。

节点导出器是最古老的导出器之一,在所有指南制定之前积累了一些不足。

顺便说一句,仪表看起来单调递增是有效的。用量表虽然你关心它的绝对值,用计数器只关心它的增长率。