R 中二进制向量中带有 0 个分隔符的 1 块的累积计数
Cumulative count of blocks of 1 with 0 separators in a binary vector in R
我有一个带有二进制向量的数据框,我想对其进行累积计数。但是,我想计算“1 组”而不是每个单独的 1,并在保留 0 分隔值的同时创建此计数的新向量。
即
df1 <- data.frame(c(0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1)
n bin
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 1
10 1
11 1
12 1
13 1
14 0
15 0
16 0
17 1
18 1
19 1
变成
n bin cumul
1 0 0
2 1 1
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 0 0
9 1 2
10 1 2
11 1 2
12 1 2
13 1 2
14 0 0
15 0 0
16 0 0
17 1 3
18 1 3
19 1 3
我该怎么做?
虽然有点手动:
l <- rle(df1$c1)$lengths
v <- rle(df1$c1)$values
v2 <- cumsum(v)
v2[duplicated(v2)] <- 0
df1$cumul <- rep(v2, times = l)
df1
c1 cumul
1 0 0
2 1 1
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 0 0
9 1 2
10 1 2
11 1 2
12 1 2
13 1 2
14 0 0
15 0 0
16 0 0
17 1 3
18 1 3
19 1 3
您可以使用包 data.table:
中的 rleid
函数
df1 <- data.frame(bin = c(0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1))
library(data.table)
setDT(df1)
df1[, cumul := rleid(bin)]
df1[bin == 0, cumul := 0]
df1[bin == 1, cumul := rleid(cumul)]
# bin cumul
# 1: 0 0
# 2: 1 1
# 3: 1 1
# 4: 1 1
# 5: 1 1
# 6: 0 0
# 7: 0 0
# 8: 0 0
# 9: 1 2
#10: 1 2
#11: 1 2
#12: 1 2
#13: 1 2
#14: 0 0
#15: 0 0
#16: 0 0
#17: 1 3
#18: 1 3
#19: 1 3
又一个
x<-c(0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1)
d<-cumsum(diff(c(0,x))>0)
d[x==0]<-0
cbind(x,d)
x d
[1,] 0 0
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 0 0
[7,] 0 0
[8,] 0 0
[9,] 1 2
[10,] 1 2
[11,] 1 2
[12,] 1 2
[13,] 1 2
[14,] 0 0
[15,] 0 0
[16,] 0 0
[17,] 1 3
[18,] 1 3
[19,] 1 3
我有一个带有二进制向量的数据框,我想对其进行累积计数。但是,我想计算“1 组”而不是每个单独的 1,并在保留 0 分隔值的同时创建此计数的新向量。 即
df1 <- data.frame(c(0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1)
n bin
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 1
10 1
11 1
12 1
13 1
14 0
15 0
16 0
17 1
18 1
19 1
变成
n bin cumul
1 0 0
2 1 1
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 0 0
9 1 2
10 1 2
11 1 2
12 1 2
13 1 2
14 0 0
15 0 0
16 0 0
17 1 3
18 1 3
19 1 3
我该怎么做?
虽然有点手动:
l <- rle(df1$c1)$lengths
v <- rle(df1$c1)$values
v2 <- cumsum(v)
v2[duplicated(v2)] <- 0
df1$cumul <- rep(v2, times = l)
df1
c1 cumul
1 0 0
2 1 1
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 0 0
9 1 2
10 1 2
11 1 2
12 1 2
13 1 2
14 0 0
15 0 0
16 0 0
17 1 3
18 1 3
19 1 3
您可以使用包 data.table:
中的rleid
函数
df1 <- data.frame(bin = c(0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1))
library(data.table)
setDT(df1)
df1[, cumul := rleid(bin)]
df1[bin == 0, cumul := 0]
df1[bin == 1, cumul := rleid(cumul)]
# bin cumul
# 1: 0 0
# 2: 1 1
# 3: 1 1
# 4: 1 1
# 5: 1 1
# 6: 0 0
# 7: 0 0
# 8: 0 0
# 9: 1 2
#10: 1 2
#11: 1 2
#12: 1 2
#13: 1 2
#14: 0 0
#15: 0 0
#16: 0 0
#17: 1 3
#18: 1 3
#19: 1 3
又一个
x<-c(0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1)
d<-cumsum(diff(c(0,x))>0)
d[x==0]<-0
cbind(x,d)
x d [1,] 0 0 [2,] 1 1 [3,] 1 1 [4,] 1 1 [5,] 1 1 [6,] 0 0 [7,] 0 0 [8,] 0 0 [9,] 1 2 [10,] 1 2 [11,] 1 2 [12,] 1 2 [13,] 1 2 [14,] 0 0 [15,] 0 0 [16,] 0 0 [17,] 1 3 [18,] 1 3 [19,] 1 3