创建一个虚拟变量,指示事件是否在过去 2 年发生

Create a dummy variable indicating if an event occured the past 2 years

我有一个纵向数据集,我想在其中创建一个列来指示过去两年中某个人是否发生过事件 (t-2)。我创建了与我的数据集共享主要特征的玩具数据(参见下面的代码)。

set.seed(123)


df <- data.frame(id = sort(rep(1:10,5)),
                 time = rep(2011:2015, 10),
                 event = rbinom(50, 1, 0.2))
                 
head(df,10)   

# Output
   id time event
1   1 2011     0
2   1 2012     0
3   1 2013     0
4   1 2014     1
5   1 2015     1
6   2 2011     0
7   2 2012     0
8   2 2013     1
9   2 2014     0
10  2 2015     0

在此数据中,我想创建一个新列 occurred 以指示该事件是否发生在过去两年中。对于前 10 行,这将导致数据如下所示:

   id time event occured
1   1 2011     0       0
2   1 2012     0       0
3   1 2013     0       0
4   1 2014     1       1
5   1 2015     1       1
6   2 2011     0       0
7   2 2012     0       0
8   2 2013     1       1
9   2 2014     0       1
10  2 2015     0       1

在最理想的情况下,我希望倒退的年数成为一个可以更改的参数(即,使构建 occurred 变得相对容易,以便指示事件是否发生过去 1 年或过去 4 年)。

谢谢!

一种data.table方法

假设您的数据中没有缺失的年份。所以所有年份 2011:2015 都是 0 或 1。

library(data.table)
# make it a data.table
setDT(df)
# 
df[, occured := as.numeric(
  frollsum(event, n = 3, align = "right", fill = event[1]) > 0), 
  by = .(id)]

#    id time event occured
# 1:  1 2011     0       0
# 2:  1 2012     0       0
# 3:  1 2013     0       0
# 4:  1 2014     1       1
# 5:  1 2015     1       1
# 6:  2 2011     0       0
# 7:  2 2012     0       0
# 8:  2 2013     1       1
# 9:  2 2014     0       1
#10:  2 2015     0       1
#11:  3 2011     1       1
#12:  3 2012     0       1
#13:  3 2013     0       1
#14:  3 2014     0       0
#15:  3 2015     0       0
#16:  4 2011     1       1
#17:  4 2012     0       1
#18:  4 2013     0       1
#19:  4 2014     0       0
#20:  4 2015     1       1
#  ...

zoo::rollapplyFUN = max:

library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(occured = rollapply(event, 3, max, align = "right", fill = event[1]))

# A tibble: 50 x 4
# Groups:   id [10]
      id  time event occured
   <int> <int> <int>   <int>
 1     1  2011     0       0
 2     1  2012     0       0
 3     1  2013     0       0
 4     1  2014     1       1
 5     1  2015     1       1
 6     2  2011     0       0
 7     2  2012     0       0
 8     2  2013     1       1
 9     2  2014     0       1
10     2  2015     0       1
11     3  2011     1       1
12     3  2012     0       1
13     3  2013     0       1
14     3  2014     0       0
15     3  2015     0       0
16     4  2011     1       1
17     4  2012     0       1
18     4  2013     0       1
19     4  2014     0       0
20     4  2015     1       1
21     5  2011     1       1
22     5  2012     0       1
23     5  2013     0       1
24     5  2014     1       1
25     5  2015     0       1
26     6  2011     0       0
27     6  2012     0       0
28     6  2013     0       0
29     6  2014     0       0
30     6  2015     0       0
# ... with 20 more rows

假设你想按组执行此操作,你可以将 zoo::rollmean()ceiling() 一起使用:

library(dplyr)

# Will calculate for t - n periods, n is a parameter which is easy to change
n <- 2

df %>% 
  group_by(id) %>% 
  arrange(id, time) %>% 
  mutate(
    occurred = ceiling(zoo::rollmean(event, k = n, fill = event[1], align = "right"))
  )
#> # A tibble: 50 × 4
#> # Groups:   id [10]
#>       id  time event occurred
#>    <int> <int> <int>    <dbl>
#>  1     1  2011     0        0
#>  2     1  2012     0        0
#>  3     1  2013     0        0
#>  4     1  2014     1        1
#>  5     1  2015     1        1
#>  6     2  2011     0        0
#>  7     2  2012     0        0
#>  8     2  2013     1        1
#>  9     2  2014     0        1
#> 10     2  2015     0        0
#> # … with 40 more rows

reprex package (v2.0.1)

于 2022-04-04 创建