如何创建一个循环:将 4 个月的误差项相加,然后除以 r 中等效的 4 个月真实数字的总和?

How to create a loop that: sums an error term for 4 months, and divide with the sum of the equivalent 4 months' true numbers, in r?

我正在研究一个预测问题,目标是预测一家公司在接下来的 9 个月内应该为库存购买的物品数量。我的数据是每月的,但我被要求创建 4 个月期间的“moving/rolling 错误率”。我试过:ma(me/outsamp*100, order = 4 , center =FALSE)。但是我意识到,平均 4 个月的百分比误差不等于:4 个月预测的总和 / 给定 4 个月期间的总和 * 100 ((prediction_1+p_2+p_3+p_4)/(实际售出数量items_1+a_2+a_3+a_4) * 100.)测试集包含 2021 年的前 9 个月。因此,我想创建一个解决方案,其中计算 1+2+3+4 月的误差,其次是 2+3+4+5 月,一直到 6+7 +8+9(也许是 for 循环)。

示例数据,(并且错误(4)在excel中计算出来,然后导入到r中,以便在这里上传。)

df<-data.frame(predictions = c(393.4, 511.4, 471.7, 679.2, 613.9, 
                           456.2, 603.2, 668.2, 512.4), 
           outsamp = c(662, 416, 594, 495, 442, 480, 263, 464, 507),
           ME = c(268.6, -95.4, 122.3, -184.2, -171.9, 23.8, -340.2, -204.2, -5.4),
           `error (4)` = c(NA, 0.0513613290263037, -0.169080636877247, -0.104425658876181, 0.400297619047619, -0.419951485748939, 
                          -0.306884480746791, NA, NA),
           `error (5)` = c(NA, NA, -0.0232272901494825,-0.125834363411619, -0.241952506596306, -0.408908582089552, -0.323701298701299, NA, NA), 
           `error (6)` = c(NA, NA, -0.0119132405309161, -0.24,-0.275529583637692, -0.332742361373067, NA, NA, NA))

excel 中的计算显示在下方excel calculations

您可以使用行索引通过循环计算滚动总和:

df <- data.frame(
   predictions.2.2 = c(393.4, 511.41,  471.6, 679.1, 613.9, 456.1,  603.1, 668.1, 512.4),
   outsamp = c(662,  416, 594, 495, 442, 480, 263, 464, 507),
   me = c(268.5,  -95.4, 122.3, -184.1, -171.9,  23.8, -340.1, -204.1, -5.4 ),
   mae = c(268.5, 95.4, 122.3,  184.1, 171.9, 23.8, 340.1,  204.1, 5.4))

for(i in 1:nrow(df)){
   df[i,"me_rsum"] <- sum(df[i:(i+3),"me"])
   df[i,"outsamp_rsum"] <- sum(df[i:(i+3),"outsamp"])
}
df$percent_diff <- (df$me_rsum / df$outsamp_rsum) * 100
df
  predictions.2.2 outsamp     me   mae me_rsum outsamp_rsum percent_diff
1          393.40     662  268.5 268.5   111.3         2167     5.136133
2          511.41     416  -95.4  95.4  -329.1         1947   -16.902928
3          471.60     594  122.3 122.3  -209.9         2011   -10.437593
4          679.10     495 -184.1 184.1  -672.3         1680   -40.017857
5          613.90     442 -171.9 171.9  -692.3         1649   -41.983020
6          456.10     480   23.8  23.8  -525.8         1714   -30.676779
7          603.10     263 -340.1 340.1      NA           NA           NA
8          668.10     464 -204.1 204.1      NA           NA           NA
9          512.40     507   -5.4   5.4      NA           NA           NA

或者,矢量化:

sapply(1:nrow(df), function(i) sum(df[i:(i+3),"me"]) / sum(df[i:(i+3),"outsamp"]) * 100)
[1]   5.136133 -16.902928 -10.437593 -40.017857 -41.983020 -30.676779
[7]         NA         NA         NA