R 由前一行值和不同列的下一行值构造的新列

Question

library(data.table)
counting <- structure(
  list(
    unique = c(1000,1001,1002,1003,1004,1005,1006,1007,1008,1000,1001,1002,1003,1004), 
    increment = c(0,0,0,1,0,0,0,1,1,0,1,0,1,0)
  ), 
  .Names = c("unique", "increment"), 
  class = "data.frame", 
  row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L))
setDT(counting)
class(counting)
counting

设置

    unique increment
 1:   1000         0
 2:   1001         0
 3:   1002         0
 4:   1003         1
 5:   1004         0
 6:   1005         0
 7:   1006         0
 8:   1007         1
 9:   1008         1
10:   1000         0
11:   1001         1
12:   1002         0
13:   1003         1
14:   1004         0

我一直鼓励我的大脑留下 excel 'if else' 语句。

如何最好地矢量化 创建新列的过程，该列从（例如）100 开始，仅根据 'increment' 列增加，然后重置回 100每次 'unique' == 1000?

期望输出为

    unique increment runningTally
 1:   1000         0          100       
 2:   1001         0          100
 3:   1002         0          100
 4:   1003         1          101
 5:   1004         0          101
 6:   1005         0          101
 7:   1006         0          101
 8:   1007         1          102
 9:   1008         1          103
10:   1000         0          100
11:   1001         1          101
12:   1002         0          101
13:   1003         1          102
14:   1004         0          102

感谢您的见解。我相信我应该远离循环，因为这将有数百万行。

Answer 1

尝试

counting[, runningTally:=cumsum(increment)+100, by=cumsum(unique==1000)]

更新

对于更一般的情况，也许下面的内容会有所帮助

counting[,runningTally:=cumsum(c(0,increment[-1]))+100,
                                     by=cumsum(unique==1000)]

Answer 2

在 dplyr 中——类似于 data.table 中的 akruns 方法——你可以这样做：

library(dplyr)
counting %>% group_by(grp = cumsum(unique == 1000)) %>%
  mutate(n = cumsum(increment) + 100) %>%
  ungroup() %>% select(-grp)  # to remove the grouping column again

Source: local data frame [14 x 3]

   unique increment   n
1    1000         0 100
2    1001         0 100
3    1002         0 100
4    1003         1 101
5    1004         0 101
6    1005         0 101
7    1006         0 101
8    1007         1 102
9    1008         1 103
10   1000         0 100
11   1001         1 101
12   1002         0 101
13   1003         1 102
14   1004         0 102

R 由前一行值和不同列的下一行值构造的新列

R new column constructed from previous row value and a different column next row value

r

dataframe

dplyr

data.table

更新