细分面板数据以应用功能

Subdividing panel data to apply a function

我正在尝试创建一列虚拟变量来记录某项处理是否适用于一家公司的面板数据。如果在特定年份应用处理 (grant),则变量应记录与该公司对应的所有年份。我知道使用 lapply /sapply 函数或 dplyr group_by() 是合适的,但我不确定如何应用它。以下为原始数据:

head(q3data_a)
 A tibble: 6 x 30
   year  fcode employ  sales avgsal scrap rework tothrs union grant   d89   d88 totrain hrsemp lscrap lemploy
  <int>  <dbl>  <int>  <dbl>  <dbl> <dbl>  <dbl>  <int> <int> <int> <int> <int>   <int>  <dbl>  <dbl>   <dbl>
1  1987 410032    100 4.70e7  35000    NA     NA     12     0     0     0     0     100  12        NA    4.61
2  1988 410032    131 4.30e7  37000    NA     NA      8     0     0     0     1      50   3.05     NA    4.88
3  1987 410440     12 1.56e6  10500    NA     NA     12     0     0     0     0      12  12        NA    2.48
4  1988 410440     13 1.97e6  11000    NA     NA     12     0     0     0     1      13  12        NA    2.56
5  1987 410495     20 7.50e5  17680    NA     NA     50     0     0     0     0      15  37.5      NA    3.00
6  1988 410495     25 1.10e5  18720    NA     NA     50     0     0     0     1      10  20        NA    3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
#   clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
#   cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>

下面是我的临时解决方案。它可以工作,但不能一概而论(例如,对于超过 2 的时间段将很难实施)。

dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods 
for (i in 1:nrow(q3data_a)){   #so if a firm receives a treatment in 1988, it receives a 1 in 1987
  if(i%%2 == 0){
    if (q3data_a[i,]$grant == 1){
      dummy1[i-1] = 1
      dummy1[i] = 1
    }
  }
}

感谢任何建议。

这是您需要的吗?

library(dplyr)
df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))

df 看起来像这样:

# A tibble: 12 x 3
    year  fcode grant
   <int>  <dbl> <int>
 1  1985 410032     0
 2  1986 410032     1
 3  1987 410032     1
 4  1988 410032     1
 5  1985 410440     1
 6  1986 410440     0
 7  1987 410440     1
 8  1988 410440     1
 9  1985 410495     0
10  1986 410495     0
11  1987 410495     0
12  1988 410495     0

输出是

# A tibble: 12 x 4
# Groups:   fcode [3]
    year  fcode grant dummy1
   <int>  <dbl> <int>  <int>
 1  1985 410032     0      1
 2  1986 410032     1      1
 3  1987 410032     1      1
 4  1988 410032     1      1
 5  1985 410440     1      1
 6  1986 410440     0      1
 7  1987 410440     1      1
 8  1988 410440     1      1
 9  1985 410495     0      0
10  1986 410495     0      0
11  1987 410495     0      0
12  1988 410495     0      0