细分面板数据以应用功能
Subdividing panel data to apply a function
我正在尝试创建一列虚拟变量来记录某项处理是否适用于一家公司的面板数据。如果在特定年份应用处理 (grant
),则变量应记录与该公司对应的所有年份。我知道使用 lapply /sapply
函数或 dplyr group_by()
是合适的,但我不确定如何应用它。以下为原始数据:
head(q3data_a)
A tibble: 6 x 30
year fcode employ sales avgsal scrap rework tothrs union grant d89 d88 totrain hrsemp lscrap lemploy
<int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1987 410032 100 4.70e7 35000 NA NA 12 0 0 0 0 100 12 NA 4.61
2 1988 410032 131 4.30e7 37000 NA NA 8 0 0 0 1 50 3.05 NA 4.88
3 1987 410440 12 1.56e6 10500 NA NA 12 0 0 0 0 12 12 NA 2.48
4 1988 410440 13 1.97e6 11000 NA NA 12 0 0 0 1 13 12 NA 2.56
5 1987 410495 20 7.50e5 17680 NA NA 50 0 0 0 0 15 37.5 NA 3.00
6 1988 410495 25 1.10e5 18720 NA NA 50 0 0 0 1 10 20 NA 3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
# clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
# cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>
下面是我的临时解决方案。它可以工作,但不能一概而论(例如,对于超过 2 的时间段将很难实施)。
dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods
for (i in 1:nrow(q3data_a)){ #so if a firm receives a treatment in 1988, it receives a 1 in 1987
if(i%%2 == 0){
if (q3data_a[i,]$grant == 1){
dummy1[i-1] = 1
dummy1[i] = 1
}
}
}
感谢任何建议。
这是您需要的吗?
library(dplyr)
df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))
df
看起来像这样:
# A tibble: 12 x 3
year fcode grant
<int> <dbl> <int>
1 1985 410032 0
2 1986 410032 1
3 1987 410032 1
4 1988 410032 1
5 1985 410440 1
6 1986 410440 0
7 1987 410440 1
8 1988 410440 1
9 1985 410495 0
10 1986 410495 0
11 1987 410495 0
12 1988 410495 0
输出是
# A tibble: 12 x 4
# Groups: fcode [3]
year fcode grant dummy1
<int> <dbl> <int> <int>
1 1985 410032 0 1
2 1986 410032 1 1
3 1987 410032 1 1
4 1988 410032 1 1
5 1985 410440 1 1
6 1986 410440 0 1
7 1987 410440 1 1
8 1988 410440 1 1
9 1985 410495 0 0
10 1986 410495 0 0
11 1987 410495 0 0
12 1988 410495 0 0
我正在尝试创建一列虚拟变量来记录某项处理是否适用于一家公司的面板数据。如果在特定年份应用处理 (grant
),则变量应记录与该公司对应的所有年份。我知道使用 lapply /sapply
函数或 dplyr group_by()
是合适的,但我不确定如何应用它。以下为原始数据:
head(q3data_a)
A tibble: 6 x 30
year fcode employ sales avgsal scrap rework tothrs union grant d89 d88 totrain hrsemp lscrap lemploy
<int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1987 410032 100 4.70e7 35000 NA NA 12 0 0 0 0 100 12 NA 4.61
2 1988 410032 131 4.30e7 37000 NA NA 8 0 0 0 1 50 3.05 NA 4.88
3 1987 410440 12 1.56e6 10500 NA NA 12 0 0 0 0 12 12 NA 2.48
4 1988 410440 13 1.97e6 11000 NA NA 12 0 0 0 1 13 12 NA 2.56
5 1987 410495 20 7.50e5 17680 NA NA 50 0 0 0 0 15 37.5 NA 3.00
6 1988 410495 25 1.10e5 18720 NA NA 50 0 0 0 1 10 20 NA 3.22
# ... with 14 more variables: lsales <dbl>, lrework <dbl>, lhrsemp <dbl>, lscrap_1 <dbl>, grant_1 <int>,
# clscrap <dbl>, cgrant <int>, clemploy <dbl>, clsales <dbl>, lavgsal <dbl>, clavgsal <dbl>,
# cgrant_1 <int>, chrsemp <dbl>, clhrsemp <dbl>
下面是我的临时解决方案。它可以工作,但不能一概而论(例如,对于超过 2 的时间段将很难实施)。
dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods
for (i in 1:nrow(q3data_a)){ #so if a firm receives a treatment in 1988, it receives a 1 in 1987
if(i%%2 == 0){
if (q3data_a[i,]$grant == 1){
dummy1[i-1] = 1
dummy1[i] = 1
}
}
}
感谢任何建议。
这是您需要的吗?
library(dplyr)
df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))
df
看起来像这样:
# A tibble: 12 x 3
year fcode grant
<int> <dbl> <int>
1 1985 410032 0
2 1986 410032 1
3 1987 410032 1
4 1988 410032 1
5 1985 410440 1
6 1986 410440 0
7 1987 410440 1
8 1988 410440 1
9 1985 410495 0
10 1986 410495 0
11 1987 410495 0
12 1988 410495 0
输出是
# A tibble: 12 x 4
# Groups: fcode [3]
year fcode grant dummy1
<int> <dbl> <int> <int>
1 1985 410032 0 1
2 1986 410032 1 1
3 1987 410032 1 1
4 1988 410032 1 1
5 1985 410440 1 1
6 1986 410440 0 1
7 1987 410440 1 1
8 1988 410440 1 1
9 1985 410495 0 0
10 1986 410495 0 0
11 1987 410495 0 0
12 1988 410495 0 0