基于对象状态的条件累积和
Conditional Cumulative sum based on the status of an object
我正在寻找基于对象状态的条件累计和。当统计数据为“新”时,我想对所有前面的行求和,但当状态变为“旧”时重置累计总和。我还想对组 ID 进行迭代。
因此,在下面的示例中:
set.seed(5)
df <- data.table(ID=c(rep("A",10),rep("B",10)),time=c(seq(1,10),seq(1,10)),
Status=sample(c("New","Old"),20,replace = TRUE))
df
ID time Status
1: A 1 Old
2: A 2 New
3: A 3 New
4: A 4 New
5: A 5 New
6: A 6 New
7: A 7 New
8: A 8 New
9: A 9 Old
10: A 10 New
11: B 1 New
12: B 2 New
13: B 3 New
14: B 4 Old
15: B 5 Old
16: B 6 New
17: B 7 New
18: B 8 Old
19: B 9 Old
20: B 10 Old
期望的结果是:
ID time Status Cond_Sum
1: A 1 Old 0
2: A 2 New 1
3: A 3 New 2
4: A 4 New 3
5: A 5 New 4
6: A 6 New 5
7: A 7 New 6
8: A 8 New 7
9: A 9 Old 0
10: A 10 New 1
11: B 1 New 1
12: B 2 New 2
13: B 3 New 3
14: B 4 Old 0
15: B 5 Old 0
16: B 6 New 1
17: B 7 New 2
18: B 8 Old 0
19: B 9 Old 0
20: B 10 Old 0
首选data.table解决方案。
非常感谢。
我们可以在'ID'、'Status'上创建一个rleid
的分组列,然后在i
(Status == "New"
)中指定条件表达式,将行序列 (seq_len(.N)
) 分配给按 'grp' 分组的 'Cond_Sum'(或使用 rowid(grp)
)
library(data.table)
df[, grp := rleid(ID, Status)]
df[, Cond_Sum := 0][Status == 'New',
Cond_Sum := seq_len(.N), grp][, grp := NULL][]
-输出
# ID time Status Cond_Sum
# 1: A 1 Old 0
# 2: A 2 New 1
# 3: A 3 New 2
# 4: A 4 New 3
# 5: A 5 New 4
# 6: A 6 New 5
# 7: A 7 New 6
# 8: A 8 New 7
# 9: A 9 Old 0
#10: A 10 New 1
#11: B 1 New 1
#12: B 2 New 2
#13: B 3 New 3
#14: B 4 Old 0
#15: B 5 Old 0
#16: B 6 New 1
#17: B 7 New 2
#18: B 8 Old 0
#19: B 9 Old 0
#20: B 10 Old 0
我正在寻找基于对象状态的条件累计和。当统计数据为“新”时,我想对所有前面的行求和,但当状态变为“旧”时重置累计总和。我还想对组 ID 进行迭代。
因此,在下面的示例中:
set.seed(5)
df <- data.table(ID=c(rep("A",10),rep("B",10)),time=c(seq(1,10),seq(1,10)),
Status=sample(c("New","Old"),20,replace = TRUE))
df
ID time Status
1: A 1 Old
2: A 2 New
3: A 3 New
4: A 4 New
5: A 5 New
6: A 6 New
7: A 7 New
8: A 8 New
9: A 9 Old
10: A 10 New
11: B 1 New
12: B 2 New
13: B 3 New
14: B 4 Old
15: B 5 Old
16: B 6 New
17: B 7 New
18: B 8 Old
19: B 9 Old
20: B 10 Old
期望的结果是:
ID time Status Cond_Sum
1: A 1 Old 0
2: A 2 New 1
3: A 3 New 2
4: A 4 New 3
5: A 5 New 4
6: A 6 New 5
7: A 7 New 6
8: A 8 New 7
9: A 9 Old 0
10: A 10 New 1
11: B 1 New 1
12: B 2 New 2
13: B 3 New 3
14: B 4 Old 0
15: B 5 Old 0
16: B 6 New 1
17: B 7 New 2
18: B 8 Old 0
19: B 9 Old 0
20: B 10 Old 0
首选data.table解决方案。
非常感谢。
我们可以在'ID'、'Status'上创建一个rleid
的分组列,然后在i
(Status == "New"
)中指定条件表达式,将行序列 (seq_len(.N)
) 分配给按 'grp' 分组的 'Cond_Sum'(或使用 rowid(grp)
)
library(data.table)
df[, grp := rleid(ID, Status)]
df[, Cond_Sum := 0][Status == 'New',
Cond_Sum := seq_len(.N), grp][, grp := NULL][]
-输出
# ID time Status Cond_Sum
# 1: A 1 Old 0
# 2: A 2 New 1
# 3: A 3 New 2
# 4: A 4 New 3
# 5: A 5 New 4
# 6: A 6 New 5
# 7: A 7 New 6
# 8: A 8 New 7
# 9: A 9 Old 0
#10: A 10 New 1
#11: B 1 New 1
#12: B 2 New 2
#13: B 3 New 3
#14: B 4 Old 0
#15: B 5 Old 0
#16: B 6 New 1
#17: B 7 New 2
#18: B 8 Old 0
#19: B 9 Old 0
#20: B 10 Old 0