统计一个事件在过去1年内发生的次数
Count the number of occurrences of an event in the past 1 year
我正在尝试计算该行日期后 1 年内事件发生的次数。我已经计算出自上次活动以来的天数,但无法弄清楚如何继续,因为我需要回顾 365 天,而不仅仅是从最后一个虚拟人的日期开始。
只有当级别不是 NA 时我才需要计数,但这不是什么大问题。
编辑:
我又添加了 14 行来显示另一个级别实际上不是 NA 的示例。
经过jay.sf的帮助,目前的结果是:
第 33 行 = 1,但希望第 33 行 = 0,因为之前 1 年内没有出现过。
与第 37 行类似。
第 39 行 = 2 但之前只出现过 1 次,不考虑今天的。
因此我认为我需要更改代码,以便我只考虑在下一行计算该行的 dummyflag。
dtIhave2 = data.table(
id = c(rep(1,17),rep(2,13),rep(3,14)),
date = c(as.Date("2014-12-05"),
as.Date("2015-01-23"),
as.Date("2015-03-06"),
as.Date("2015-05-15"),
as.Date("2015-08-06"),
as.Date("2015-10-29"),
as.Date("2016-01-21"),
as.Date("2016-04-06"),
as.Date("2016-07-11"),
as.Date("2016-10-03"),
as.Date("2016-11-11"),
as.Date("2016-12-07"),
as.Date("2017-10-25"),
as.Date("2018-01-09"),
as.Date("2018-02-12"),
as.Date("2018-07-04"),
as.Date("2018-11-30"),
as.Date("2014-05-14"),
as.Date("2014-09-03"),
as.Date("2014-09-04"),
as.Date("2014-10-15"),
as.Date("2014-11-08"),
as.Date("2014-12-05"),
as.Date("2014-12-18"),
as.Date("2014-12-20"),
as.Date("2014-12-23"),
as.Date("2015-05-15"),
as.Date("2015-08-19"),
as.Date("2016-06-23"),
as.Date("2017-04-21"),
as.Date("2015-01-03"),
as.Date("2015-02-13"),
as.Date("2015-06-01"),#
as.Date("2015-09-05"),
as.Date("2015-12-01"),
as.Date("2016-06-10"),
as.Date("2016-10-16"),#
as.Date("2016-12-15"),
as.Date("2017-04-30"),#
as.Date("2017-06-23"),
as.Date("2017-10-01"),
as.Date("2017-12-01"),
as.Date("2018-03-10"),
as.Date("2018-06-02")
),
level = c(rnorm(10,7,1),
NA,
rnorm(9,7,1),
NA,NA,
7,
NA,NA,NA,
rnorm(4,7,1),
rnorm(14,7,1)),
dummyflag = c(rep(0 ,10),
1,
rep(0,9),
1,
1,
0,
1,
1,
1,
rep(0,4),
rep(0,2),
1,
rep(0,3),
1,
rep(0,1),
1,
rep(0,5)),
dayssincedummy = c(rep(NA,11),
26,348,424,458,600,749,
rep(NA,4),
24,27,40,2,3,143,239,548,850,
rep(NA,3),
96,
183,
375,
503,
60,
196,
54,
154,
215,
314,
398
)
)
dtIhave2$within1yr = sapply(seq_len(nrow(dtIhave2)),function(i) dtIhave2[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) & id == id[i] & !is.na(level[i]), sum(dummyflag %in% 1)])
> dtIhave2
id date level dummyflag dayssincedummy within1yr
1: 1 2014-12-05 7.977480 0 NA 0
2: 1 2015-01-23 7.589833 0 NA 0
3: 1 2015-03-06 7.301062 0 NA 0
4: 1 2015-05-15 6.739734 0 NA 0
5: 1 2015-08-06 5.682534 0 NA 0
6: 1 2015-10-29 6.659627 0 NA 0
7: 1 2016-01-21 7.159197 0 NA 0
8: 1 2016-04-06 9.957324 0 NA 0
9: 1 2016-07-11 6.607859 0 NA 0
10: 1 2016-10-03 7.093568 0 NA 0
11: 1 2016-11-11 NA 1 NA 0
12: 1 2016-12-07 5.527618 0 26 1
13: 1 2017-10-25 6.055255 0 348 1
14: 1 2018-01-09 6.031328 0 424 0
15: 1 2018-02-12 5.875067 0 458 0
16: 1 2018-07-04 6.875352 0 600 0
17: 1 2018-11-30 8.439167 0 749 0
18: 2 2014-05-14 7.381595 0 NA 0
19: 2 2014-09-03 7.325306 0 NA 0
20: 2 2014-09-04 8.101320 0 NA 0
21: 2 2014-10-15 NA 1 NA 0
22: 2 2014-11-08 NA 1 24 0
23: 2 2014-12-05 7.000000 0 27 2
24: 2 2014-12-18 NA 1 40 0
25: 2 2014-12-20 NA 1 2 0
26: 2 2014-12-23 NA 1 3 0
27: 2 2015-05-15 7.211657 0 143 5
28: 2 2015-08-19 7.274550 0 239 5
29: 2 2016-06-23 7.216593 0 548 0
30: 2 2017-04-21 6.516086 0 850 0
31: 3 2015-01-03 7.945201 0 NA 0
32: 3 2015-02-13 8.417933 0 NA 0
33: 3 2015-06-01 9.290180 1 NA 1
34: 3 2015-09-05 8.400137 0 96 1
35: 3 2015-12-01 8.115692 0 183 1
36: 3 2016-06-10 7.322929 0 375 0
37: 3 2016-10-16 4.946102 1 503 1
38: 3 2016-12-15 9.435223 0 60 1
39: 3 2017-04-30 6.671779 1 196 2
40: 3 2017-06-23 6.994869 0 54 2
41: 3 2017-10-01 7.540090 0 154 2
42: 3 2017-12-01 7.332589 0 215 1
43: 3 2018-03-10 7.779732 0 314 1
44: 3 2018-06-02 6.068338 0 398 0
id date level dummyflag dayssincedummy within1yr
> dtIwant2
id date level dummyflag dayssincedummy within1yr
1: 1 2014-12-05 7.977480 0 NA 0
2: 1 2015-01-23 7.589833 0 NA 0
3: 1 2015-03-06 7.301062 0 NA 0
4: 1 2015-05-15 6.739734 0 NA 0
5: 1 2015-08-06 5.682534 0 NA 0
6: 1 2015-10-29 6.659627 0 NA 0
7: 1 2016-01-21 7.159197 0 NA 0
8: 1 2016-04-06 9.957324 0 NA 0
9: 1 2016-07-11 6.607859 0 NA 0
10: 1 2016-10-03 7.093568 0 NA 0
11: 1 2016-11-11 NA 1 NA 0
12: 1 2016-12-07 5.527618 0 26 1
13: 1 2017-10-25 6.055255 0 348 1
14: 1 2018-01-09 6.031328 0 424 0
15: 1 2018-02-12 5.875067 0 458 0
16: 1 2018-07-04 6.875352 0 600 0
17: 1 2018-11-30 8.439167 0 749 0
18: 2 2014-05-14 7.381595 0 NA 0
19: 2 2014-09-03 7.325306 0 NA 0
20: 2 2014-09-04 8.101320 0 NA 0
21: 2 2014-10-15 NA 1 NA 0
22: 2 2014-11-08 NA 1 24 0
23: 2 2014-12-05 7.000000 0 27 2
24: 2 2014-12-18 NA 1 40 0
25: 2 2014-12-20 NA 1 2 0
26: 2 2014-12-23 NA 1 3 0
27: 2 2015-05-15 7.211657 0 143 5
28: 2 2015-08-19 7.274550 0 239 5
29: 2 2016-06-23 7.216593 0 548 0
30: 2 2017-04-21 6.516086 0 850 0
31: 3 2015-01-03 7.945201 0 NA 0
32: 3 2015-02-13 8.417933 0 NA 0
33: 3 2015-06-01 9.290180 1 NA 0
34: 3 2015-09-05 8.400137 0 96 1
35: 3 2015-12-01 8.115692 0 183 1
36: 3 2016-06-10 7.322929 0 375 0
37: 3 2016-10-16 4.946102 1 503 0
38: 3 2016-12-15 9.435223 0 60 1
39: 3 2017-04-30 6.671779 1 196 1
40: 3 2017-06-23 6.994869 0 54 2
41: 3 2017-10-01 7.540090 0 154 2
42: 3 2017-12-01 7.332589 0 215 1
43: 3 2018-03-10 7.779732 0 314 1
44: 3 2018-06-02 6.068338 0 398 0
id date level dummyflag dayssincedummy within1yr
输入乱码,因为我的问题中的代码太多:ajksdfcksjadbf jklsdaakjsdhfkjsdhafkajsdfasdf
ASD
加气
fgasdfsadfasdfas dfa dfas dfasd fa
尝试使用 seq.Date
和 '-1 year'
并使用 sapply
.
遍历行
library(data.table)
sapply(seq_len(nrow(dtIhave)), \(i)
dtIhave[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) &
id == id[i] & !is.na(level[i]), sum(dummyflag %in% 1)])
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 5 5 0 0
您也可以使用 '-365 days'
,但 '-1 year'
也会考虑闰年。
编辑
对于更新后的案例,将 !is.na(level[i])
替换为 ummyflag[i] == 0
。
sapply(seq_len(nrow(dtIhave2)), \(i)
dtIhave2[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) &
id == id[i] & dummyflag[i] == 0, sum(dummyflag %in% 1)])
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 5 5 0 0 0 0 0 1 1 0 0 1 0 2 2 1 1 0
解决方案:
dtIhave[, res:=fifelse(
is.na(level),0,
dtIhave[id==.BY$id & between(date,(.BY$date-365), .BY$date) & dummyflag==1, .N]),
, by=.(id,date)
]
解释:
这使用一个简单的 If-else 语句(使用 data.table 的 fifelse()
),但每个 id
和 date
组都这样做。
- 如果级别为
NA
,则结果列为0。
- 如果级别不是 NA,那么我们只需将
dtIhave
过滤到具有此 id
(id==.BY$id
) 的行,并且日期介于此日期减去 365 ( .BY$date-365
) 和这个日期 (.BY$date
),然后我们使用 .N
. 计算这些行
特殊的 .BY
可以在 j
中使用;它在列表中保存 by
列的值。即。 .BY$id
保存id
的值,.BY$date保存当前组date
的值
输出:
id date level dummyflag dayssincedummy res
1: 1 2014-12-05 6.831267 NA NA 0
2: 1 2015-01-23 7.167449 NA NA 0
3: 1 2015-03-06 6.500918 NA NA 0
4: 1 2015-05-15 7.267101 NA NA 0
5: 1 2015-08-06 6.463343 NA NA 0
6: 1 2015-10-29 7.685856 NA NA 0
7: 1 2016-01-21 6.465524 NA NA 0
8: 1 2016-04-06 7.602419 NA NA 0
9: 1 2016-07-11 7.339648 NA NA 0
10: 1 2016-10-03 6.049635 NA NA 0
11: 1 2016-11-11 NA 1 NA 0
12: 1 2016-12-07 6.639634 NA 26 1
13: 1 2017-10-25 7.951767 NA 348 1
14: 1 2018-01-09 5.444352 NA 424 0
15: 1 2018-02-12 8.972908 NA 458 0
16: 1 2018-07-04 7.084616 NA 600 0
17: 1 2018-11-30 5.602063 NA 749 0
18: 2 2014-05-14 7.120637 NA NA 0
19: 2 2014-09-03 7.260747 NA NA 0
20: 2 2014-09-04 7.676648 NA NA 0
21: 2 2014-10-15 NA 1 NA 0
22: 2 2014-11-08 NA 1 24 0
23: 2 2014-12-05 7.000000 NA 27 2
24: 2 2014-12-18 NA 1 40 0
25: 2 2014-12-20 NA 1 2 0
26: 2 2014-12-23 NA 1 3 0
27: 2 2015-05-15 6.137783 NA 143 5
28: 2 2015-08-19 7.088102 NA 239 5
29: 2 2016-06-23 7.620440 NA 548 0
30: 2 2017-04-21 10.325672 NA 850 0
我正在尝试计算该行日期后 1 年内事件发生的次数。我已经计算出自上次活动以来的天数,但无法弄清楚如何继续,因为我需要回顾 365 天,而不仅仅是从最后一个虚拟人的日期开始。
只有当级别不是 NA 时我才需要计数,但这不是什么大问题。
编辑:
我又添加了 14 行来显示另一个级别实际上不是 NA 的示例。
经过jay.sf的帮助,目前的结果是:
第 33 行 = 1,但希望第 33 行 = 0,因为之前 1 年内没有出现过。
与第 37 行类似。
第 39 行 = 2 但之前只出现过 1 次,不考虑今天的。
因此我认为我需要更改代码,以便我只考虑在下一行计算该行的 dummyflag。
dtIhave2 = data.table(
id = c(rep(1,17),rep(2,13),rep(3,14)),
date = c(as.Date("2014-12-05"),
as.Date("2015-01-23"),
as.Date("2015-03-06"),
as.Date("2015-05-15"),
as.Date("2015-08-06"),
as.Date("2015-10-29"),
as.Date("2016-01-21"),
as.Date("2016-04-06"),
as.Date("2016-07-11"),
as.Date("2016-10-03"),
as.Date("2016-11-11"),
as.Date("2016-12-07"),
as.Date("2017-10-25"),
as.Date("2018-01-09"),
as.Date("2018-02-12"),
as.Date("2018-07-04"),
as.Date("2018-11-30"),
as.Date("2014-05-14"),
as.Date("2014-09-03"),
as.Date("2014-09-04"),
as.Date("2014-10-15"),
as.Date("2014-11-08"),
as.Date("2014-12-05"),
as.Date("2014-12-18"),
as.Date("2014-12-20"),
as.Date("2014-12-23"),
as.Date("2015-05-15"),
as.Date("2015-08-19"),
as.Date("2016-06-23"),
as.Date("2017-04-21"),
as.Date("2015-01-03"),
as.Date("2015-02-13"),
as.Date("2015-06-01"),#
as.Date("2015-09-05"),
as.Date("2015-12-01"),
as.Date("2016-06-10"),
as.Date("2016-10-16"),#
as.Date("2016-12-15"),
as.Date("2017-04-30"),#
as.Date("2017-06-23"),
as.Date("2017-10-01"),
as.Date("2017-12-01"),
as.Date("2018-03-10"),
as.Date("2018-06-02")
),
level = c(rnorm(10,7,1),
NA,
rnorm(9,7,1),
NA,NA,
7,
NA,NA,NA,
rnorm(4,7,1),
rnorm(14,7,1)),
dummyflag = c(rep(0 ,10),
1,
rep(0,9),
1,
1,
0,
1,
1,
1,
rep(0,4),
rep(0,2),
1,
rep(0,3),
1,
rep(0,1),
1,
rep(0,5)),
dayssincedummy = c(rep(NA,11),
26,348,424,458,600,749,
rep(NA,4),
24,27,40,2,3,143,239,548,850,
rep(NA,3),
96,
183,
375,
503,
60,
196,
54,
154,
215,
314,
398
)
)
dtIhave2$within1yr = sapply(seq_len(nrow(dtIhave2)),function(i) dtIhave2[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) & id == id[i] & !is.na(level[i]), sum(dummyflag %in% 1)])
> dtIhave2
id date level dummyflag dayssincedummy within1yr
1: 1 2014-12-05 7.977480 0 NA 0
2: 1 2015-01-23 7.589833 0 NA 0
3: 1 2015-03-06 7.301062 0 NA 0
4: 1 2015-05-15 6.739734 0 NA 0
5: 1 2015-08-06 5.682534 0 NA 0
6: 1 2015-10-29 6.659627 0 NA 0
7: 1 2016-01-21 7.159197 0 NA 0
8: 1 2016-04-06 9.957324 0 NA 0
9: 1 2016-07-11 6.607859 0 NA 0
10: 1 2016-10-03 7.093568 0 NA 0
11: 1 2016-11-11 NA 1 NA 0
12: 1 2016-12-07 5.527618 0 26 1
13: 1 2017-10-25 6.055255 0 348 1
14: 1 2018-01-09 6.031328 0 424 0
15: 1 2018-02-12 5.875067 0 458 0
16: 1 2018-07-04 6.875352 0 600 0
17: 1 2018-11-30 8.439167 0 749 0
18: 2 2014-05-14 7.381595 0 NA 0
19: 2 2014-09-03 7.325306 0 NA 0
20: 2 2014-09-04 8.101320 0 NA 0
21: 2 2014-10-15 NA 1 NA 0
22: 2 2014-11-08 NA 1 24 0
23: 2 2014-12-05 7.000000 0 27 2
24: 2 2014-12-18 NA 1 40 0
25: 2 2014-12-20 NA 1 2 0
26: 2 2014-12-23 NA 1 3 0
27: 2 2015-05-15 7.211657 0 143 5
28: 2 2015-08-19 7.274550 0 239 5
29: 2 2016-06-23 7.216593 0 548 0
30: 2 2017-04-21 6.516086 0 850 0
31: 3 2015-01-03 7.945201 0 NA 0
32: 3 2015-02-13 8.417933 0 NA 0
33: 3 2015-06-01 9.290180 1 NA 1
34: 3 2015-09-05 8.400137 0 96 1
35: 3 2015-12-01 8.115692 0 183 1
36: 3 2016-06-10 7.322929 0 375 0
37: 3 2016-10-16 4.946102 1 503 1
38: 3 2016-12-15 9.435223 0 60 1
39: 3 2017-04-30 6.671779 1 196 2
40: 3 2017-06-23 6.994869 0 54 2
41: 3 2017-10-01 7.540090 0 154 2
42: 3 2017-12-01 7.332589 0 215 1
43: 3 2018-03-10 7.779732 0 314 1
44: 3 2018-06-02 6.068338 0 398 0
id date level dummyflag dayssincedummy within1yr
> dtIwant2
id date level dummyflag dayssincedummy within1yr
1: 1 2014-12-05 7.977480 0 NA 0
2: 1 2015-01-23 7.589833 0 NA 0
3: 1 2015-03-06 7.301062 0 NA 0
4: 1 2015-05-15 6.739734 0 NA 0
5: 1 2015-08-06 5.682534 0 NA 0
6: 1 2015-10-29 6.659627 0 NA 0
7: 1 2016-01-21 7.159197 0 NA 0
8: 1 2016-04-06 9.957324 0 NA 0
9: 1 2016-07-11 6.607859 0 NA 0
10: 1 2016-10-03 7.093568 0 NA 0
11: 1 2016-11-11 NA 1 NA 0
12: 1 2016-12-07 5.527618 0 26 1
13: 1 2017-10-25 6.055255 0 348 1
14: 1 2018-01-09 6.031328 0 424 0
15: 1 2018-02-12 5.875067 0 458 0
16: 1 2018-07-04 6.875352 0 600 0
17: 1 2018-11-30 8.439167 0 749 0
18: 2 2014-05-14 7.381595 0 NA 0
19: 2 2014-09-03 7.325306 0 NA 0
20: 2 2014-09-04 8.101320 0 NA 0
21: 2 2014-10-15 NA 1 NA 0
22: 2 2014-11-08 NA 1 24 0
23: 2 2014-12-05 7.000000 0 27 2
24: 2 2014-12-18 NA 1 40 0
25: 2 2014-12-20 NA 1 2 0
26: 2 2014-12-23 NA 1 3 0
27: 2 2015-05-15 7.211657 0 143 5
28: 2 2015-08-19 7.274550 0 239 5
29: 2 2016-06-23 7.216593 0 548 0
30: 2 2017-04-21 6.516086 0 850 0
31: 3 2015-01-03 7.945201 0 NA 0
32: 3 2015-02-13 8.417933 0 NA 0
33: 3 2015-06-01 9.290180 1 NA 0
34: 3 2015-09-05 8.400137 0 96 1
35: 3 2015-12-01 8.115692 0 183 1
36: 3 2016-06-10 7.322929 0 375 0
37: 3 2016-10-16 4.946102 1 503 0
38: 3 2016-12-15 9.435223 0 60 1
39: 3 2017-04-30 6.671779 1 196 1
40: 3 2017-06-23 6.994869 0 54 2
41: 3 2017-10-01 7.540090 0 154 2
42: 3 2017-12-01 7.332589 0 215 1
43: 3 2018-03-10 7.779732 0 314 1
44: 3 2018-06-02 6.068338 0 398 0
id date level dummyflag dayssincedummy within1yr
输入乱码,因为我的问题中的代码太多:ajksdfcksjadbf jklsdaakjsdhfkjsdhafkajsdfasdf ASD 加气 fgasdfsadfasdfas dfa dfas dfasd fa
尝试使用 seq.Date
和 '-1 year'
并使用 sapply
.
library(data.table)
sapply(seq_len(nrow(dtIhave)), \(i)
dtIhave[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) &
id == id[i] & !is.na(level[i]), sum(dummyflag %in% 1)])
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 5 5 0 0
您也可以使用 '-365 days'
,但 '-1 year'
也会考虑闰年。
编辑
对于更新后的案例,将 !is.na(level[i])
替换为 ummyflag[i] == 0
。
sapply(seq_len(nrow(dtIhave2)), \(i)
dtIhave2[date %between% rev(seq.Date(date[i], length.out=2, by='-1 year')) &
id == id[i] & dummyflag[i] == 0, sum(dummyflag %in% 1)])
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 2 0 0 0 5 5 0 0 0 0 0 1 1 0 0 1 0 2 2 1 1 0
解决方案:
dtIhave[, res:=fifelse(
is.na(level),0,
dtIhave[id==.BY$id & between(date,(.BY$date-365), .BY$date) & dummyflag==1, .N]),
, by=.(id,date)
]
解释:
这使用一个简单的 If-else 语句(使用 data.table 的 fifelse()
),但每个 id
和 date
组都这样做。
- 如果级别为
NA
,则结果列为0。 - 如果级别不是 NA,那么我们只需将
dtIhave
过滤到具有此id
(id==.BY$id
) 的行,并且日期介于此日期减去 365 (.BY$date-365
) 和这个日期 (.BY$date
),然后我们使用.N
. 计算这些行
特殊的 .BY
可以在 j
中使用;它在列表中保存 by
列的值。即。 .BY$id
保存id
的值,.BY$date保存当前组date
的值
输出:
id date level dummyflag dayssincedummy res
1: 1 2014-12-05 6.831267 NA NA 0
2: 1 2015-01-23 7.167449 NA NA 0
3: 1 2015-03-06 6.500918 NA NA 0
4: 1 2015-05-15 7.267101 NA NA 0
5: 1 2015-08-06 6.463343 NA NA 0
6: 1 2015-10-29 7.685856 NA NA 0
7: 1 2016-01-21 6.465524 NA NA 0
8: 1 2016-04-06 7.602419 NA NA 0
9: 1 2016-07-11 7.339648 NA NA 0
10: 1 2016-10-03 6.049635 NA NA 0
11: 1 2016-11-11 NA 1 NA 0
12: 1 2016-12-07 6.639634 NA 26 1
13: 1 2017-10-25 7.951767 NA 348 1
14: 1 2018-01-09 5.444352 NA 424 0
15: 1 2018-02-12 8.972908 NA 458 0
16: 1 2018-07-04 7.084616 NA 600 0
17: 1 2018-11-30 5.602063 NA 749 0
18: 2 2014-05-14 7.120637 NA NA 0
19: 2 2014-09-03 7.260747 NA NA 0
20: 2 2014-09-04 7.676648 NA NA 0
21: 2 2014-10-15 NA 1 NA 0
22: 2 2014-11-08 NA 1 24 0
23: 2 2014-12-05 7.000000 NA 27 2
24: 2 2014-12-18 NA 1 40 0
25: 2 2014-12-20 NA 1 2 0
26: 2 2014-12-23 NA 1 3 0
27: 2 2015-05-15 6.137783 NA 143 5
28: 2 2015-08-19 7.088102 NA 239 5
29: 2 2016-06-23 7.620440 NA 548 0
30: 2 2017-04-21 10.325672 NA 850 0