RRD验收标准

RRD acceptance criteria

我正在使用 RRD 监视数据源。我们看到许多情况下 RRD 存储 NaN 结果,尽管我们知道数据已收到,因为我们还将收到的数据附加到文件以进行测试。当我们检查差异时,我们看到以下内容:

我试图将数据粘贴为两列,但它的结构不正确,但实际上我们在下面看到的是电子表格的两列。左列是rrd dump,右列是当时到达的实际数据。

"           <!-- 2017-09-28 06:00:00 UTC / 1506578400 --> <row><v>1.1999200000e+06</v></row>"   1506578412:1202000
"           <!-- 2017-09-28 06:05:00 UTC / 1506578700 --> <row><v>1.2538400000e+06</v></row>"   1506578712:1256000
"           <!-- 2017-09-28 06:10:00 UTC / 1506579000 --> <row><v>1.2310400000e+06</v></row>"   1506579012:1230000
"           <!-- 2017-09-28 06:15:00 UTC / 1506579300 --> <row><v>1.2415200000e+06</v></row>"   1506579312:1242000
"           <!-- 2017-09-28 06:20:00 UTC / 1506579600 --> <row><v>1.2304800000e+06</v></row>"   1506579612:1230000
"           <!-- 2017-09-28 06:25:00 UTC / 1506579900 --> <row><v>1.2357600000e+06</v></row>"   1506579912:1236000
"           <!-- 2017-09-28 06:30:00 UTC / 1506580200 --> <row><v>1.1284800000e+06</v></row>"   1506580212:1124000
"           <!-- 2017-09-28 06:35:00 UTC / 1506580500 --> <row><v>1.2238400000e+06</v></row>"   1506580512:1228000
"           <!-- 2017-09-28 06:40:00 UTC / 1506580800 --> <row><v>NaN</v></row>"    1506580813:1222000
"           <!-- 2017-09-28 06:45:00 UTC / 1506581100 --> <row><v>1.2400000000e+06</v></row>"   1506581112:1240000
"           <!-- 2017-09-28 06:50:00 UTC / 1506581400 --> <row><v>1.2284800000e+06</v></row>"   1506581412:1228000
"           <!-- 2017-09-28 06:55:00 UTC / 1506581700 --> <row><v>8.9392000000e+05</v></row>"   1506581712:880000
"           <!-- 2017-09-28 07:00:00 UTC / 1506582000 --> <row><v>NaN</v></row>"    1506582014:1000000
"           <!-- 2017-09-28 07:05:00 UTC / 1506582300 --> <row><v>NaN</v></row>"    1506582315:738000
"           <!-- 2017-09-28 07:10:00 UTC / 1506582600 --> <row><v>1.1760000000e+06</v></row>"   1506582613:1176000
"           <!-- 2017-09-28 07:15:00 UTC / 1506582900 --> <row><v>1.1874800000e+06</v></row>"   1506582912:1188000
"           <!-- 2017-09-28 07:20:00 UTC / 1506583200 --> <row><v>1.2033600000e+06</v></row>"   1506583212:1204000
"           <!-- 2017-09-28 07:25:00 UTC / 1506583500 --> <row><v>1.2097600000e+06</v></row>"   1506583512:1210000
"           <!-- 2017-09-28 07:30:00 UTC / 1506583800 --> <row><v>1.0717600000e+06</v></row>"   1506583811:1066000
"           <!-- 2017-09-28 07:35:00 UTC / 1506584100 --> <row><v>NaN</v></row>"    1506584112:1222000
"           <!-- 2017-09-28 07:40:00 UTC / 1506584400 --> <row><v>1.1760000000e+06</v></row>"   1506584412:1176000
"           <!-- 2017-09-28 07:45:00 UTC / 1506584700 --> <row><v>1.2048000000e+06</v></row>"   1506584712:1206000
"           <!-- 2017-09-28 07:50:00 UTC / 1506585000 --> <row><v>1.0255200000e+06</v></row>"   1506585012:1018000
"           <!-- 2017-09-28 07:55:00 UTC / 1506585300 --> <row><v>1.2004000000e+06</v></row>"   1506585312:1208000
"           <!-- 2017-09-28 08:00:00 UTC / 1506585600 --> <row><v>1.1676800000e+06</v></row>"   1506585612:1166000
"           <!-- 2017-09-28 08:05:00 UTC / 1506585900 --> <row><v>1.2024800000e+06</v></row>"   1506585912:1204000
"           <!-- 2017-09-28 08:10:00 UTC / 1506586200 --> <row><v>1.2116800000e+06</v></row>"   1506586212:1212000
"           <!-- 2017-09-28 08:15:00 UTC / 1506586500 --> <row><v>NaN</v></row>"    1506586513:886000
"           <!-- 2017-09-28 08:20:00 UTC / 1506586800 --> <row><v>1.1940000000e+06</v></row>"   1506586812:1194000
"           <!-- 2017-09-28 08:25:00 UTC / 1506587100 --> <row><v>1.1959200000e+06</v></row>"   1506587112:1196000
"           <!-- 2017-09-28 08:30:00 UTC / 1506587400 --> <row><v>NaN</v></row>"    1506587413:1206000
"           <!-- 2017-09-28 08:35:00 UTC / 1506587700 --> <row><v>1.1440000000e+06</v></row>"   1506587712:1144000
"           <!-- 2017-09-28 08:40:00 UTC / 1506588000 --> <row><v>NaN</v></row>"    1506588013:668000
"           <!-- 2017-09-28 08:45:00 UTC / 1506588300 --> <row><v>1.2080000000e+06</v></row>"   1506588312:1208000
"           <!-- 2017-09-28 08:50:00 UTC / 1506588600 --> <row><v>NaN</v></row>"    1506588613:1156000
"           <!-- 2017-09-28 08:55:00 UTC / 1506588900 --> <row><v>1.2080000000e+06</v></row>"   1506588912:1208000
"           <!-- 2017-09-28 09:00:00 UTC / 1506589200 --> <row><v>1.1945600000e+06</v></row>"   1506589212:1194000
"           <!-- 2017-09-28 09:05:00 UTC / 1506589500 --> <row><v>1.1786400000e+06</v></row>"   1506589512:1178000
"           <!-- 2017-09-28 09:10:00 UTC / 1506589800 --> <row><v>1.1396000000e+06</v></row>"   1506589811:1138000
"           <!-- 2017-09-28 09:15:00 UTC / 1506590100 --> <row><v>NaN</v></row>"    1506590113:1006000
"           <!-- 2017-09-28 09:20:00 UTC / 1506590400 --> <row><v>1.1780000000e+06</v></row>"   1506590412:1178000
"           <!-- 2017-09-28 09:25:00 UTC / 1506590700 --> <row><v>1.1799200000e+06</v></row>"   1506590712:1180000
"           <!-- 2017-09-28 09:30:00 UTC / 1506591000 --> <row><v>1.1953600000e+06</v></row>"   1506591012:1196000
"           <!-- 2017-09-28 09:35:00 UTC / 1506591300 --> <row><v>1.1806400000e+06</v></row>"   1506591312:1180000
"           <!-- 2017-09-28 09:40:00 UTC / 1506591600 --> <row><v>1.1588800000e+06</v></row>"   1506591612:1158000
"           <!-- 2017-09-28 09:45:00 UTC / 1506591900 --> <row><v>1.2002400000e+06</v></row>"   1506591912:1202000
"           <!-- 2017-09-28 09:50:00 UTC / 1506592200 --> <row><v>1.0656800000e+06</v></row>"   1506592212:1060000
"           <!-- 2017-09-28 09:55:00 UTC / 1506592500 --> <row><v>1.2078400000e+06</v></row>"   1506592512:1214000
"           <!-- 2017-09-28 10:00:00 UTC / 1506592800 --> <row><v>1.1640800000e+06</v></row>"   1506592812:1162000
"           <!-- 2017-09-28 10:05:00 UTC / 1506593100 --> <row><v>1.1754400000e+06</v></row>"   1506593112:1176000

我们可以看到数据似乎不被接受的情况几乎总是在它到达的时间有点超出趋势的情况下。

我们如何着手扩大验收标准,以便接受所有这些数据点?

相关 RRD 的 RRD 信息如下所示:

root@ra:/var/www/genie/public_html# rrdtool info /an/data/SI1.rrd 
filename = "/an/data/SI1.rrd"
rrd_version = "0003"
step = 300
last_update = 1506594312
header_size = 1000
ds[probe1-temp].index = 0
ds[probe1-temp].type = "GAUGE"
ds[probe1-temp].minimal_heartbeat = 300
ds[probe1-temp].min = 0.0000000000e+00
ds[probe1-temp].max = 5.0000000000e+06
ds[probe1-temp].last_ds = "1226000"
ds[probe1-temp].value = NaN
ds[probe1-temp].unknown_sec = 12
rra[0].cf = "MIN"
rra[0].rows = 1440
rra[0].cur_row = 238
rra[0].pdp_per_row = 12
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = 1.1754400000e+06
rra[0].cdp_prep[0].unknown_datapoints = 2
rra[1].cf = "MAX"
rra[1].rows = 1440
rra[1].cur_row = 1220
rra[1].pdp_per_row = 12
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 1.2140000000e+06
rra[1].cdp_prep[0].unknown_datapoints = 2
rra[2].cf = "AVERAGE"
rra[2].rows = 1440
rra[2].cur_row = 1205
rra[2].pdp_per_row = 1
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = NaN
rra[2].cdp_prep[0].unknown_datapoints = 0
root@ra:# 

您已将DS心跳设置为300,步长也设置为300。

这意味着,如果您的数据相隔 300 秒或更长时间到达,则它们将存储为 NaN,这就是您所看到的。从您提供的统计数据中,您可以看到在 NaN 行上,实际时间间隔是 301 或 302 秒,即 >300,因此会导致 NaN,因为它超过了心跳时间。

您通常应该将心跳设置为预期数据间隔的两倍,即IE 的两倍步长,以便处理这种情况。

尝试将心跳设置为600;这应该可以解决问题。