Stata,根据与其他观测值的相对位置做一个变量

Stata, make a variable based on the relative position to other observations

我正在进行一项事件研究,请参阅下面的可重现示例。我只包括一个单位,但这足以解决我要问的问题。

input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end

我生成了 dif_year 应该用年差来处理:

sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1
by unit: egen target_distance = min(year_target)
drop year_target
gen dif_year = year_nb - target_distance
drop year_nb target_distance

按单位进行一次处理效果很好,但这里我有两个。使用上面的代码片段,我得到以下结果:

unit year treatment dif_year
1 2000 0 -2
1 2001 0 -1
1 2002 1 0
1 2003 0 1
1 2004 0 2
1 2005 1 3
1 2006 0 4
1 2007 0 5

您可以看到它锚定到第一个处理 (2002) 但忽略了第二个 (2005)。我如何调整 dif_year 使其适用于多种治疗(此处为 2005 年)? 2003 年及之前的值是正确的,但我希望 2004 年的值为 -1,2005 年为 0,2006 年为 -1,2007 年为 -2。

我找到了解决我自己问题的快速方法。

我生成了一个变量,如果没有处理,该变量会采用缺失值。然后我遍历行,用它的值替换每个治疗年下方和上方的行,直到没有任何剩余的缺失值。

在这里,三次迭代就足够了,但我将循环设置为 i = 10 只是为了表明添加更多循环不会改变结果。

sort unit year
bysort unit: gen year_nb = _n
bysort unit: gen year_target = year_nb if treatment == 1

gen closest_treatment = year_target

forvalues i = 1(1)10 {
    bysort unit: replace closest_treatment = closest_treatment[_n-`i'] if(year_target[_n-`i'] != . & closest_treatment[_n] == .)
    bysort unit: replace closest_treatment = closest_treatment[_n+`i'] if(year_target[_n+`i'] != . & closest_treatment[_n] == .)
}
replace year_target = closest_treatment if year_target == .
drop closest_treatment

gen dif_year = year_nb - year_target
drop year_nb year_target

编辑:在我的示例中,两次处理之间的行数是偶数。但是这个解决方案也适用于奇数,因为要迭代的最后一行恰好在两个处理之间。我们将距离分配给上一次或下一次治疗并不重要,除非你对数字的符号感兴趣,我假设你在做事件时想考虑研究(例如,如果与上一次治疗的距离为 +3 年,则与下一次治疗的距离为 -3)。此代码片段将值分配给先前的处理(正号)。如果你想要相反的,只需交换循环内的两行。

这里有一个最大年数不需要硬编码的解决方案。

clear
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
1 2008 0
1 2009 0
1 2010 1
end

sort unit year

*Set all treatment years to 0
gen diff_year = 0 if treatment == 1

*Initilize locals used in the loop
local stop "false"
local diff_distance = 0

while "`stop'" == "false" {
    
    **Replace diff to one more than diff on row above if unit is the same, 
    * no diff for this row, and diff on row above is the diff distance 
    * for this iteration of the loop.
    replace diff_year = diff_year[_n-1] + 1 if unit == unit[_n-1] & missing(diff_year) & diff_year[_n-1] == `diff_distance'
    
    **Replace diff to one less than diff on row below if unit is the same, 
    * no diff for this row, and diff on row above is the diff distance 
    * for this iteration of the loop.
    replace diff_year = diff_year[_n+1] - 1 if unit == unit[_n+1] & missing(diff_year) & diff_year[_n+1] == `diff_distance' * -1
    
    *Test if there are still missing values, and if so set stop local to true
    count if missing(diff_year)
    if `r(N)' == 0 local stop "true"
    
    *Increment the diff distance by one for next loop
    local diff_distance = `diff_distance' + 1
    
}

此解决方案不使用循环。显然,问题取决于向前看和向后看。所以暂时倒转时间是一个可以使用的装置

clear 
input unit year treatment
1 2000 0
1 2001 0
1 2002 1
1 2003 0
1 2004 0
1 2005 1
1 2006 0
1 2007 0
end

bysort unit (year) : gen wanted1 = 0 if treatment 
by unit: replace wanted1 = wanted1[_n-1] + 1 if missing(wanted1)
gen negyear = -year 
bysort unit (negyear) : gen wanted2 = 0 if treatment 
by unit: replace wanted2 = wanted2[_n-1] + 1 if missing(wanted2)

gen wanted = cond(abs(wanted2) < abs(wanted1), - wanted2, wanted1)

sort unit year 

list , sep(0) 

     +---------------------------------------------------------------+
     | unit   year   treatm~t   wanted1   negyear   wanted2   wanted |
     |---------------------------------------------------------------|
  1. |    1   2000          0         .     -2000         2       -2 |
  2. |    1   2001          0         .     -2001         1       -1 |
  3. |    1   2002          1         0     -2002         0        0 |
  4. |    1   2003          0         1     -2003         2        1 |
  5. |    1   2004          0         2     -2004         1       -1 |
  6. |    1   2005          1         0     -2005         0        0 |
  7. |    1   2006          0         1     -2006         .        1 |
  8. |    1   2007          0         2     -2007         .        2 |
     +---------------------------------------------------------------+