R 由前一行值和不同列的下一行值构造的新列
R new column constructed from previous row value and a different column next row value
library(data.table)
counting <- structure(
list(
unique = c(1000,1001,1002,1003,1004,1005,1006,1007,1008,1000,1001,1002,1003,1004),
increment = c(0,0,0,1,0,0,0,1,1,0,1,0,1,0)
),
.Names = c("unique", "increment"),
class = "data.frame",
row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L))
setDT(counting)
class(counting)
counting
设置
unique increment
1: 1000 0
2: 1001 0
3: 1002 0
4: 1003 1
5: 1004 0
6: 1005 0
7: 1006 0
8: 1007 1
9: 1008 1
10: 1000 0
11: 1001 1
12: 1002 0
13: 1003 1
14: 1004 0
我一直鼓励我的大脑留下 excel 'if else' 语句。
如何最好地矢量化 创建新列的过程,该列从(例如)100 开始,仅根据 'increment' 列增加,然后重置回 100每次 'unique' == 1000?
期望输出为
unique increment runningTally
1: 1000 0 100
2: 1001 0 100
3: 1002 0 100
4: 1003 1 101
5: 1004 0 101
6: 1005 0 101
7: 1006 0 101
8: 1007 1 102
9: 1008 1 103
10: 1000 0 100
11: 1001 1 101
12: 1002 0 101
13: 1003 1 102
14: 1004 0 102
感谢您的见解。我相信我应该远离循环,因为这将有数百万行。
尝试
counting[, runningTally:=cumsum(increment)+100, by=cumsum(unique==1000)]
更新
对于更一般的情况,也许下面的内容会有所帮助
counting[,runningTally:=cumsum(c(0,increment[-1]))+100,
by=cumsum(unique==1000)]
在 dplyr 中——类似于 data.table 中的 akruns 方法——你可以这样做:
library(dplyr)
counting %>% group_by(grp = cumsum(unique == 1000)) %>%
mutate(n = cumsum(increment) + 100) %>%
ungroup() %>% select(-grp) # to remove the grouping column again
Source: local data frame [14 x 3]
unique increment n
1 1000 0 100
2 1001 0 100
3 1002 0 100
4 1003 1 101
5 1004 0 101
6 1005 0 101
7 1006 0 101
8 1007 1 102
9 1008 1 103
10 1000 0 100
11 1001 1 101
12 1002 0 101
13 1003 1 102
14 1004 0 102
library(data.table)
counting <- structure(
list(
unique = c(1000,1001,1002,1003,1004,1005,1006,1007,1008,1000,1001,1002,1003,1004),
increment = c(0,0,0,1,0,0,0,1,1,0,1,0,1,0)
),
.Names = c("unique", "increment"),
class = "data.frame",
row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L))
setDT(counting)
class(counting)
counting
设置
unique increment
1: 1000 0
2: 1001 0
3: 1002 0
4: 1003 1
5: 1004 0
6: 1005 0
7: 1006 0
8: 1007 1
9: 1008 1
10: 1000 0
11: 1001 1
12: 1002 0
13: 1003 1
14: 1004 0
我一直鼓励我的大脑留下 excel 'if else' 语句。
如何最好地矢量化 创建新列的过程,该列从(例如)100 开始,仅根据 'increment' 列增加,然后重置回 100每次 'unique' == 1000?
期望输出为
unique increment runningTally
1: 1000 0 100
2: 1001 0 100
3: 1002 0 100
4: 1003 1 101
5: 1004 0 101
6: 1005 0 101
7: 1006 0 101
8: 1007 1 102
9: 1008 1 103
10: 1000 0 100
11: 1001 1 101
12: 1002 0 101
13: 1003 1 102
14: 1004 0 102
感谢您的见解。我相信我应该远离循环,因为这将有数百万行。
尝试
counting[, runningTally:=cumsum(increment)+100, by=cumsum(unique==1000)]
更新
对于更一般的情况,也许下面的内容会有所帮助
counting[,runningTally:=cumsum(c(0,increment[-1]))+100,
by=cumsum(unique==1000)]
在 dplyr 中——类似于 data.table 中的 akruns 方法——你可以这样做:
library(dplyr)
counting %>% group_by(grp = cumsum(unique == 1000)) %>%
mutate(n = cumsum(increment) + 100) %>%
ungroup() %>% select(-grp) # to remove the grouping column again
Source: local data frame [14 x 3]
unique increment n
1 1000 0 100
2 1001 0 100
3 1002 0 100
4 1003 1 101
5 1004 0 101
6 1005 0 101
7 1006 0 101
8 1007 1 102
9 1008 1 103
10 1000 0 100
11 1001 1 101
12 1002 0 101
13 1003 1 102
14 1004 0 102