创建一个虚拟变量,指示事件是否在过去 2 年发生
Create a dummy variable indicating if an event occured the past 2 years
我有一个纵向数据集,我想在其中创建一个列来指示过去两年中某个人是否发生过事件 (t-2
)。我创建了与我的数据集共享主要特征的玩具数据(参见下面的代码)。
set.seed(123)
df <- data.frame(id = sort(rep(1:10,5)),
time = rep(2011:2015, 10),
event = rbinom(50, 1, 0.2))
head(df,10)
# Output
id time event
1 1 2011 0
2 1 2012 0
3 1 2013 0
4 1 2014 1
5 1 2015 1
6 2 2011 0
7 2 2012 0
8 2 2013 1
9 2 2014 0
10 2 2015 0
在此数据中,我想创建一个新列 occurred
以指示该事件是否发生在过去两年中。对于前 10 行,这将导致数据如下所示:
id time event occured
1 1 2011 0 0
2 1 2012 0 0
3 1 2013 0 0
4 1 2014 1 1
5 1 2015 1 1
6 2 2011 0 0
7 2 2012 0 0
8 2 2013 1 1
9 2 2014 0 1
10 2 2015 0 1
在最理想的情况下,我希望倒退的年数成为一个可以更改的参数(即,使构建 occurred
变得相对容易,以便指示事件是否发生过去 1 年或过去 4 年)。
谢谢!
一种data.table
方法
假设您的数据中没有缺失的年份。所以所有年份 2011:2015 都是 0 或 1。
library(data.table)
# make it a data.table
setDT(df)
#
df[, occured := as.numeric(
frollsum(event, n = 3, align = "right", fill = event[1]) > 0),
by = .(id)]
# id time event occured
# 1: 1 2011 0 0
# 2: 1 2012 0 0
# 3: 1 2013 0 0
# 4: 1 2014 1 1
# 5: 1 2015 1 1
# 6: 2 2011 0 0
# 7: 2 2012 0 0
# 8: 2 2013 1 1
# 9: 2 2014 0 1
#10: 2 2015 0 1
#11: 3 2011 1 1
#12: 3 2012 0 1
#13: 3 2013 0 1
#14: 3 2014 0 0
#15: 3 2015 0 0
#16: 4 2011 1 1
#17: 4 2012 0 1
#18: 4 2013 0 1
#19: 4 2014 0 0
#20: 4 2015 1 1
# ...
与 zoo::rollapply
和 FUN = max
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(occured = rollapply(event, 3, max, align = "right", fill = event[1]))
# A tibble: 50 x 4
# Groups: id [10]
id time event occured
<int> <int> <int> <int>
1 1 2011 0 0
2 1 2012 0 0
3 1 2013 0 0
4 1 2014 1 1
5 1 2015 1 1
6 2 2011 0 0
7 2 2012 0 0
8 2 2013 1 1
9 2 2014 0 1
10 2 2015 0 1
11 3 2011 1 1
12 3 2012 0 1
13 3 2013 0 1
14 3 2014 0 0
15 3 2015 0 0
16 4 2011 1 1
17 4 2012 0 1
18 4 2013 0 1
19 4 2014 0 0
20 4 2015 1 1
21 5 2011 1 1
22 5 2012 0 1
23 5 2013 0 1
24 5 2014 1 1
25 5 2015 0 1
26 6 2011 0 0
27 6 2012 0 0
28 6 2013 0 0
29 6 2014 0 0
30 6 2015 0 0
# ... with 20 more rows
假设你想按组执行此操作,你可以将 zoo::rollmean()
与 ceiling()
一起使用:
library(dplyr)
# Will calculate for t - n periods, n is a parameter which is easy to change
n <- 2
df %>%
group_by(id) %>%
arrange(id, time) %>%
mutate(
occurred = ceiling(zoo::rollmean(event, k = n, fill = event[1], align = "right"))
)
#> # A tibble: 50 × 4
#> # Groups: id [10]
#> id time event occurred
#> <int> <int> <int> <dbl>
#> 1 1 2011 0 0
#> 2 1 2012 0 0
#> 3 1 2013 0 0
#> 4 1 2014 1 1
#> 5 1 2015 1 1
#> 6 2 2011 0 0
#> 7 2 2012 0 0
#> 8 2 2013 1 1
#> 9 2 2014 0 1
#> 10 2 2015 0 0
#> # … with 40 more rows
由 reprex package (v2.0.1)
于 2022-04-04 创建
我有一个纵向数据集,我想在其中创建一个列来指示过去两年中某个人是否发生过事件 (t-2
)。我创建了与我的数据集共享主要特征的玩具数据(参见下面的代码)。
set.seed(123)
df <- data.frame(id = sort(rep(1:10,5)),
time = rep(2011:2015, 10),
event = rbinom(50, 1, 0.2))
head(df,10)
# Output
id time event
1 1 2011 0
2 1 2012 0
3 1 2013 0
4 1 2014 1
5 1 2015 1
6 2 2011 0
7 2 2012 0
8 2 2013 1
9 2 2014 0
10 2 2015 0
在此数据中,我想创建一个新列 occurred
以指示该事件是否发生在过去两年中。对于前 10 行,这将导致数据如下所示:
id time event occured
1 1 2011 0 0
2 1 2012 0 0
3 1 2013 0 0
4 1 2014 1 1
5 1 2015 1 1
6 2 2011 0 0
7 2 2012 0 0
8 2 2013 1 1
9 2 2014 0 1
10 2 2015 0 1
在最理想的情况下,我希望倒退的年数成为一个可以更改的参数(即,使构建 occurred
变得相对容易,以便指示事件是否发生过去 1 年或过去 4 年)。
谢谢!
一种data.table
方法
假设您的数据中没有缺失的年份。所以所有年份 2011:2015 都是 0 或 1。
library(data.table)
# make it a data.table
setDT(df)
#
df[, occured := as.numeric(
frollsum(event, n = 3, align = "right", fill = event[1]) > 0),
by = .(id)]
# id time event occured
# 1: 1 2011 0 0
# 2: 1 2012 0 0
# 3: 1 2013 0 0
# 4: 1 2014 1 1
# 5: 1 2015 1 1
# 6: 2 2011 0 0
# 7: 2 2012 0 0
# 8: 2 2013 1 1
# 9: 2 2014 0 1
#10: 2 2015 0 1
#11: 3 2011 1 1
#12: 3 2012 0 1
#13: 3 2013 0 1
#14: 3 2014 0 0
#15: 3 2015 0 0
#16: 4 2011 1 1
#17: 4 2012 0 1
#18: 4 2013 0 1
#19: 4 2014 0 0
#20: 4 2015 1 1
# ...
与 zoo::rollapply
和 FUN = max
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(occured = rollapply(event, 3, max, align = "right", fill = event[1]))
# A tibble: 50 x 4
# Groups: id [10]
id time event occured
<int> <int> <int> <int>
1 1 2011 0 0
2 1 2012 0 0
3 1 2013 0 0
4 1 2014 1 1
5 1 2015 1 1
6 2 2011 0 0
7 2 2012 0 0
8 2 2013 1 1
9 2 2014 0 1
10 2 2015 0 1
11 3 2011 1 1
12 3 2012 0 1
13 3 2013 0 1
14 3 2014 0 0
15 3 2015 0 0
16 4 2011 1 1
17 4 2012 0 1
18 4 2013 0 1
19 4 2014 0 0
20 4 2015 1 1
21 5 2011 1 1
22 5 2012 0 1
23 5 2013 0 1
24 5 2014 1 1
25 5 2015 0 1
26 6 2011 0 0
27 6 2012 0 0
28 6 2013 0 0
29 6 2014 0 0
30 6 2015 0 0
# ... with 20 more rows
假设你想按组执行此操作,你可以将 zoo::rollmean()
与 ceiling()
一起使用:
library(dplyr)
# Will calculate for t - n periods, n is a parameter which is easy to change
n <- 2
df %>%
group_by(id) %>%
arrange(id, time) %>%
mutate(
occurred = ceiling(zoo::rollmean(event, k = n, fill = event[1], align = "right"))
)
#> # A tibble: 50 × 4
#> # Groups: id [10]
#> id time event occurred
#> <int> <int> <int> <dbl>
#> 1 1 2011 0 0
#> 2 1 2012 0 0
#> 3 1 2013 0 0
#> 4 1 2014 1 1
#> 5 1 2015 1 1
#> 6 2 2011 0 0
#> 7 2 2012 0 0
#> 8 2 2013 1 1
#> 9 2 2014 0 1
#> 10 2 2015 0 0
#> # … with 40 more rows
由 reprex package (v2.0.1)
于 2022-04-04 创建