基于 data.table R 中列的条件滚动差异或梯度

conditional rolling difference or gradient based on a column in data.table R

我需要根据非常大的另一列的条件值获取一列的梯度 data.table。

> require(data.table)
> DT = data.table( ID = c(rep('A', 8), rep('B', 6)),
                   Condition = c(0,1,0,0,1,1,0,1,0,0,1,0,0,1),
                   Value = c(4,3,2,1,4,3,2,1,4,3,2,1,4,3))

我想通过 ID 获取 'Value' 列的滚动梯度,仅适用于条件 == 1 的行。

> desired_output
    ID Condition Value Gradient
 1:  A         0     4       NA    # condition isn't met so no gradient
 2:  A         1     3        0    # condition is met but there is no predecessor. Gradient set to 0
 3:  A         0     2       NA    # condition isn't met so no gradient
 4:  A         0     1       NA    # condition isn't met so no gradient
 5:  A         1     4        1    # condition is met and gradient is 4-3=1
 6:  A         1     3       -1    # condition is met and gradient is 3-4=-1
 7:  A         0     2       NA    # condition isn't met so no gradient
 8:  A         1     1       -2    # condition is met and gradient is 1-3=-2
 9:  B         0     4       NA
10:  B         0     3       NA
11:  B         1     2        0
12:  B         0     1       NA
13:  B         0     4       NA
14:  B         1     3        1

如果可能的话,我更喜欢本地 data.table 解决方案。

请注意:可以通过子设置 DT[Condition == 1] 然后重新加入结果来实现。如果可能的话,我想避免分集和重新加入。

library(data.table)
library(magrittr)
dt = data.table( ID = c(rep('A', 8), rep('B', 6)),
                 Condition = c(0,1,0,0,1,1,0,1,0,0,1,0,0,1),
                 Value = c(4,3,2,1,4,3,2,1,4,3,2,1,4,3))

# 1
dt[Condition == 1, Gradient := Value - shift(Value, fill = first(Value)), by = ID][]
#>     ID Condition Value Gradient
#>  1:  A         0     4       NA
#>  2:  A         1     3        0
#>  3:  A         0     2       NA
#>  4:  A         0     1       NA
#>  5:  A         1     4        1
#>  6:  A         1     3       -1
#>  7:  A         0     2       NA
#>  8:  A         1     1       -2
#>  9:  B         0     4       NA
#> 10:  B         0     3       NA
#> 11:  B         1     2        0
#> 12:  B         0     1       NA
#> 13:  B         0     4       NA
#> 14:  B         1     3        1

#2
dt$grad <- c (NA, NA, -1, -2,1, -1, - 1, -2, NA, NA, NA, -1,2,1)

dt[Condition == 1, Value2 := Value, by = ID] %>% 
  .[, Value2 := shift(nafill(Value2, "locf"))] %>% 
  .[ Value2 != 1, Gradient2 := Value - Value2] %>% 
  .[, Value2 := NULL] %>% 
  .[]
#>     ID Condition Value Gradient grad Gradient2
#>  1:  A         0     4       NA   NA        NA
#>  2:  A         1     3        0   NA        NA
#>  3:  A         0     2       NA   -1        -1
#>  4:  A         0     1       NA   -2        -2
#>  5:  A         1     4        1    1         1
#>  6:  A         1     3       -1   -1        -1
#>  7:  A         0     2       NA   -1        -1
#>  8:  A         1     1       -2   -2        -2
#>  9:  B         0     4       NA   NA        NA
#> 10:  B         0     3       NA   NA        NA
#> 11:  B         1     2        0   NA        NA
#> 12:  B         0     1       NA   -1        -1
#> 13:  B         0     4       NA    2         2
#> 14:  B         1     3        1    1         1

reprex package (v2.0.0)

于 2021-06-04 创建