R:如何在 DF 中填写依赖于前几行的值

R: How to fill out values in a DF which are dependent on previous rows

我有一个数据框,我想根据前面的行进行一些计算(比如在 excel 中向下拖动信息)。我的 DF 看起来像这样:

set.seed(1234)
df <- data.frame(DA = sample(1:3, 6, rep = TRUE) ,HB = sample(0:600, 6, rep = TRUE), D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE), GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0  )
df$GL[1] = 646
df$R[1] = 60
df$DA[5] = 2

df
#   DA  HB D AD GM  GL  R RM
# 1  2 399 4 13 30 646 60  0
# 2  2  97 4 10 31  NA NA  0
# 3  1 102 5  5 31  NA NA  0
# 4  3 325 4  2 31  NA NA  0
# 5  2  78 3 14 30  NA NA  0
# 6  1 269 4  8 30  NA NA  0

我想填写我的 GL、R 和 RM 列中的缺失值,这些值相互依赖。例如

attach(df)

#calc GL and R for the 2nd row

df$GL[2] <- GL[1]+HB[2]+RM[1]

df$R[2] <- df$GL[2]*D[2]/GM[2]*AD[2]

#calc GL and R for the 3rd row

df$GL[3] <- df$GL[2]+HB[3]+df$RM[2]
df$R[3] <-df$GL[3]*D[3]/GM[3]*AD[3]

#and so on..

有没有办法一次完成所有计算,而不是逐行计算?

此外,每次 'DA' 列 = 1 时,'R' 的先前值应与 'RM' 的同一行相加,但仅限于最后一次出现的值.这样

attach(df)

df$RM[3] <-R[1]+R[2]+R[3]

#and RM for the 6th row is calculated by

#df$RM[6] <-R[4]+R[5]+R[6]

提前致谢!

您可以使用 for 循环来计算 GL 值,一旦有了它们,您就可以直接对 R 列进行计算。

for(i in 2:nrow(df)) {
  df$GL[i] <- with(df, GL[i-1]+HB[i]+RM[i-1])
}
df$R <- with(df, (GL* D)/(GM *AD))

可以使用索引来解决前两个问题:

> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
+                  D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
+                  GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0  )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> # View df
> df
  DA  HB D AD GM  GL  R RM
1  2 399 4 13 30 646 60  0
2  2  97 4 10 31  NA NA  0
3  1 102 5  5 31  NA NA  0
4  3 325 4  2 31  NA NA  0
5  2  78 3 14 30  NA NA  0
6  1 269 4  8 30  NA NA  0

> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1] + df$RM[-nrow(df)]))

> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]

第三题可以用循环解决

> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
+   df$RM[i] <- df$R[i] + df$RM[i-1] * (df$DA[i] != 2)
+ }

> df
  DA  HB D AD GM   GL         R         RM
1  2 399 4 13 30  646 60.000000  60.000000
2  2  97 4 10 31  743  9.587097   9.587097
3  1 102 5  5 31  845 27.258065  36.845161
4  3 325 4  2 31 1170 75.483871 112.329032
5  2  78 3 14 30 1248  8.914286   8.914286
6  1 269 4  8 30 1517 25.283333  34.197619

这些结果看起来正确吗?

更新:假设 RM 应该 = R,除非 DA = 1,在这种情况下,RM = 当前行和前 R 的总和,直到(不包括)DA = 1 的上一行,请尝试以下循环。

df$RM[1] <- cs <- df$R[1]
for (i in 2:nrow(df)) {
  df$RM[i] <- df$R[i] + cs * (df$DA[i] == 1)
  cs <- cs * (df$DA[i] != 1) + df$R[i] 
}