Stata,根据与其他观测值的相对位置做一个变量
Stata, make a variable based on the relative position to other observations
我正在进行一项事件研究,请参阅下面的可重现示例。我只包括一个单位,但这足以解决我要问的问题。
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end
我生成了 dif_year
应该用年差来处理:
sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1
by unit: egen target_distance = min(year_target)
drop year_target
gen dif_year = year_nb - target_distance
drop year_nb target_distance
按单位进行一次处理效果很好,但这里我有两个。使用上面的代码片段,我得到以下结果:
unit
year
treatment
dif_year
1
2000
0
-2
1
2001
0
-1
1
2002
1
0
1
2003
0
1
1
2004
0
2
1
2005
1
3
1
2006
0
4
1
2007
0
5
您可以看到它锚定到第一个处理 (2002) 但忽略了第二个 (2005)。我如何调整 dif_year
使其适用于多种治疗(此处为 2005 年)? 2003 年及之前的值是正确的,但我希望 2004 年的值为 -1,2005 年为 0,2006 年为 -1,2007 年为 -2。
我找到了解决我自己问题的快速方法。
我生成了一个变量,如果没有处理,该变量会采用缺失值。然后我遍历行,用它的值替换每个治疗年下方和上方的行,直到没有任何剩余的缺失值。
在这里,三次迭代就足够了,但我将循环设置为 i = 10 只是为了表明添加更多循环不会改变结果。
sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1
gen closest_treatment = year_target
forvalues i = 1(1)10 {
bysort unit: replace closest_treatment = closest_treatment[_n-`i'] if(year_target[_n-`i'] != . & closest_treatment[_n] == .)
bysort unit: replace closest_treatment = closest_treatment[_n+`i'] if(year_target[_n+`i'] != . & closest_treatment[_n] == .)
}
replace year_target = closest_treatment if year_target == .
drop closest_treatment
gen dif_year = year_nb - year_target
drop year_nb year_target
编辑:在我的示例中,两次处理之间的行数是偶数。但是这个解决方案也适用于奇数,因为要迭代的最后一行恰好在两个处理之间。我们将距离分配给上一次或下一次治疗并不重要,除非你对数字的符号感兴趣,我假设你在做事件时想考虑研究(例如,如果与上一次治疗的距离为 +3 年,则与下一次治疗的距离为 -3)。此代码片段将值分配给先前的处理(正号)。如果你想要相反的,只需交换循环内的两行。
这里有一个最大年数不需要硬编码的解决方案。
clear
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
1 2008 0
1 2009 0
1 2010 1
end
sort unit year
*Set all treatment years to 0
gen diff_year = 0 if treatment == 1
*Initilize locals used in the loop
local stop "false"
local diff_distance = 0
while "`stop'" == "false" {
**Replace diff to one more than diff on row above if unit is the same,
* no diff for this row, and diff on row above is the diff distance
* for this iteration of the loop.
replace diff_year = diff_year[_n-1] + 1 if unit == unit[_n-1] & missing(diff_year) & diff_year[_n-1] == `diff_distance'
**Replace diff to one less than diff on row below if unit is the same,
* no diff for this row, and diff on row above is the diff distance
* for this iteration of the loop.
replace diff_year = diff_year[_n+1] - 1 if unit == unit[_n+1] & missing(diff_year) & diff_year[_n+1] == `diff_distance' * -1
*Test if there are still missing values, and if so set stop local to true
count if missing(diff_year)
if `r(N)' == 0 local stop "true"
*Increment the diff distance by one for next loop
local diff_distance = `diff_distance' + 1
}
此解决方案不使用循环。显然,问题取决于向前看和向后看。所以暂时倒转时间是一个可以使用的装置
clear
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end
bysort unit (year) : gen wanted1 = 0 if treatment
by unit: replace wanted1 = wanted1[_n-1] + 1 if missing(wanted1)
gen negyear = -year
bysort unit (negyear) : gen wanted2 = 0 if treatment
by unit: replace wanted2 = wanted2[_n-1] + 1 if missing(wanted2)
gen wanted = cond(abs(wanted2) < abs(wanted1), - wanted2, wanted1)
sort unit year
list , sep(0)
+---------------------------------------------------------------+
| unit year treatm~t wanted1 negyear wanted2 wanted |
|---------------------------------------------------------------|
1. | 1 2000 0 . -2000 2 -2 |
2. | 1 2001 0 . -2001 1 -1 |
3. | 1 2002 1 0 -2002 0 0 |
4. | 1 2003 0 1 -2003 2 1 |
5. | 1 2004 0 2 -2004 1 -1 |
6. | 1 2005 1 0 -2005 0 0 |
7. | 1 2006 0 1 -2006 . 1 |
8. | 1 2007 0 2 -2007 . 2 |
+---------------------------------------------------------------+
我正在进行一项事件研究,请参阅下面的可重现示例。我只包括一个单位,但这足以解决我要问的问题。
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end
我生成了 dif_year
应该用年差来处理:
sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1
by unit: egen target_distance = min(year_target)
drop year_target
gen dif_year = year_nb - target_distance
drop year_nb target_distance
按单位进行一次处理效果很好,但这里我有两个。使用上面的代码片段,我得到以下结果:
unit | year | treatment | dif_year |
---|---|---|---|
1 | 2000 | 0 | -2 |
1 | 2001 | 0 | -1 |
1 | 2002 | 1 | 0 |
1 | 2003 | 0 | 1 |
1 | 2004 | 0 | 2 |
1 | 2005 | 1 | 3 |
1 | 2006 | 0 | 4 |
1 | 2007 | 0 | 5 |
您可以看到它锚定到第一个处理 (2002) 但忽略了第二个 (2005)。我如何调整 dif_year
使其适用于多种治疗(此处为 2005 年)? 2003 年及之前的值是正确的,但我希望 2004 年的值为 -1,2005 年为 0,2006 年为 -1,2007 年为 -2。
我找到了解决我自己问题的快速方法。
我生成了一个变量,如果没有处理,该变量会采用缺失值。然后我遍历行,用它的值替换每个治疗年下方和上方的行,直到没有任何剩余的缺失值。
在这里,三次迭代就足够了,但我将循环设置为 i = 10 只是为了表明添加更多循环不会改变结果。
sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1
gen closest_treatment = year_target
forvalues i = 1(1)10 {
bysort unit: replace closest_treatment = closest_treatment[_n-`i'] if(year_target[_n-`i'] != . & closest_treatment[_n] == .)
bysort unit: replace closest_treatment = closest_treatment[_n+`i'] if(year_target[_n+`i'] != . & closest_treatment[_n] == .)
}
replace year_target = closest_treatment if year_target == .
drop closest_treatment
gen dif_year = year_nb - year_target
drop year_nb year_target
编辑:在我的示例中,两次处理之间的行数是偶数。但是这个解决方案也适用于奇数,因为要迭代的最后一行恰好在两个处理之间。我们将距离分配给上一次或下一次治疗并不重要,除非你对数字的符号感兴趣,我假设你在做事件时想考虑研究(例如,如果与上一次治疗的距离为 +3 年,则与下一次治疗的距离为 -3)。此代码片段将值分配给先前的处理(正号)。如果你想要相反的,只需交换循环内的两行。
这里有一个最大年数不需要硬编码的解决方案。
clear
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
1 2008 0
1 2009 0
1 2010 1
end
sort unit year
*Set all treatment years to 0
gen diff_year = 0 if treatment == 1
*Initilize locals used in the loop
local stop "false"
local diff_distance = 0
while "`stop'" == "false" {
**Replace diff to one more than diff on row above if unit is the same,
* no diff for this row, and diff on row above is the diff distance
* for this iteration of the loop.
replace diff_year = diff_year[_n-1] + 1 if unit == unit[_n-1] & missing(diff_year) & diff_year[_n-1] == `diff_distance'
**Replace diff to one less than diff on row below if unit is the same,
* no diff for this row, and diff on row above is the diff distance
* for this iteration of the loop.
replace diff_year = diff_year[_n+1] - 1 if unit == unit[_n+1] & missing(diff_year) & diff_year[_n+1] == `diff_distance' * -1
*Test if there are still missing values, and if so set stop local to true
count if missing(diff_year)
if `r(N)' == 0 local stop "true"
*Increment the diff distance by one for next loop
local diff_distance = `diff_distance' + 1
}
此解决方案不使用循环。显然,问题取决于向前看和向后看。所以暂时倒转时间是一个可以使用的装置
clear
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end
bysort unit (year) : gen wanted1 = 0 if treatment
by unit: replace wanted1 = wanted1[_n-1] + 1 if missing(wanted1)
gen negyear = -year
bysort unit (negyear) : gen wanted2 = 0 if treatment
by unit: replace wanted2 = wanted2[_n-1] + 1 if missing(wanted2)
gen wanted = cond(abs(wanted2) < abs(wanted1), - wanted2, wanted1)
sort unit year
list , sep(0)
+---------------------------------------------------------------+
| unit year treatm~t wanted1 negyear wanted2 wanted |
|---------------------------------------------------------------|
1. | 1 2000 0 . -2000 2 -2 |
2. | 1 2001 0 . -2001 1 -1 |
3. | 1 2002 1 0 -2002 0 0 |
4. | 1 2003 0 1 -2003 2 1 |
5. | 1 2004 0 2 -2004 1 -1 |
6. | 1 2005 1 0 -2005 0 0 |
7. | 1 2006 0 1 -2006 . 1 |
8. | 1 2007 0 2 -2007 . 2 |
+---------------------------------------------------------------+