如何使用面板数据中另一个系列的增长率来推断一个系列？

Question

我有一个庞大的（约 1.23 亿次观察）面板数据集，其中包含几对系列的数据，例如amount_old 和 amount_new。系列 amount_new 在时间上比系列 amount_old 向前延伸得更远，所以我想使用根据 amount_new.[=21= 计算的增长率来推断 amount_old 的值]

这里是一个小样本数据集：

clear

input str3 str_id year amount_old amount_new
       aaa   2000     1105.34      1568.2  
       aaa   2001   1122.6268   1571.8486  
       aaa   2002   1132.0478    1605.832  
       aaa   2003   1186.9295   1666.4644  
       aaa   2004   1187.2502   1714.0043  
       aaa   2005   1230.0004   1744.4136  
       aaa   2006   1252.9979   1821.2219  
       aaa   2007   1289.5164   1855.4785  
       aaa   2008   1351.6705   1864.0597  
       aaa   2009    1353.639   1877.5152  
       aaa   2010   1398.2009   1916.5298  
       aaa   2011           .   1921.5906  
       aaa   2012           .   2003.8804  
       aaa   2013           .   2051.6525  
       aaa   2014           .   2072.8235  
       bbb   2000   7964.3029     9043.68  
       bbb   2001   8062.8454   9319.9098  
       bbb   2002    8223.277   9415.5202  
       bbb   2003   8605.8333    9760.014  
       bbb   2004   8636.8787   10024.964  
       bbb   2005   8927.8641   10327.588  
       bbb   2006     9284.91   10408.275  
       bbb   2007           .   10693.495  
       bbb   2008           .   11141.559  
       bbb   2009           .   11367.394  
       bbb   2010           .   11671.628  
       bbb   2011           .   11994.248  
       ccc   1990    20593.59   31049.493  
       ccc   1991   20723.578   31364.674  
       ccc   1992   21119.377   32870.953  
       ccc   1993           .   33138.507  
       ccc   1994           .   33383.829  
       ccc   1995           .   33776.957  
       ccc   1996           .   33966.004  
       ccc   1997           .   34324.091  
       ccc   1998           .   35744.175  
end

加载数据后，我可以通过遍历每个观察来推断：

encode str_id, gen(id)
xtset id year
gen amount_new_gr = amount_new / L.amount_new - 1
forv i = 1/`=_N' {
    if missing(amount_old[`i']) {
        replace amount_old = amount_old[`=`i'-1'] * (1 + amount_new_gr[`i']) in `i'
    }
}

但这相当慢，而且数据集很大，我需要对大约 45 对系列（series1_old、series1_new、series2_old 等）执行此操作.).

有没有办法在 Stata 13 中使用滞后运算符或面板数据集的某些其他特征来做到这一点？

Answer 1

假设您真的想这样做（从统计上讲这可能不是您的最佳选择），请尝试代码中提供的替代方案：

clear
set more off

*----- exmple data -----

input str3 str_id year amount_old amount_new
       aaa   2000     1105.34      1568.2  
       aaa   2001   1122.6268   1571.8486  
       aaa   2002   1132.0478    1605.832  
       aaa   2003   1186.9295   1666.4644  
       aaa   2004   1187.2502   1714.0043  
       aaa   2005   1230.0004   1744.4136  
       aaa   2006   1252.9979   1821.2219  
       aaa   2007   1289.5164   1855.4785  
       aaa   2008   1351.6705   1864.0597  
       aaa   2009    1353.639   1877.5152  
       aaa   2010   1398.2009   1916.5298  
       aaa   2011           .   1921.5906  
       aaa   2012           .   2003.8804  
       aaa   2013           .   2051.6525  
       aaa   2014           .   2072.8235  
       bbb   2000   7964.3029     9043.68  
       bbb   2001   8062.8454   9319.9098  
       bbb   2002    8223.277   9415.5202  
       bbb   2003   8605.8333    9760.014  
       bbb   2004   8636.8787   10024.964  
       bbb   2005   8927.8641   10327.588  
       bbb   2006     9284.91   10408.275  
       bbb   2007           .   10693.495  
       bbb   2008           .   11141.559  
       bbb   2009           .   11367.394  
       bbb   2010           .   11671.628  
       bbb   2011           .   11994.248  
       ccc   1990    20593.59   31049.493  
       ccc   1991   20723.578   31364.674  
       ccc   1992   21119.377   32870.953  
       ccc   1993           .   33138.507  
       ccc   1994           .   33383.829  
       ccc   1995           .   33776.957  
       ccc   1996           .   33966.004  
       ccc   1997           .   34324.091  
       ccc   1998           .   35744.175  
end

// create more observations
expand 60000

bysort str_id year : gen idpre = _n
egen id = group(idpre str_id)

order id
drop str_id idpre

// xtset the data
xtset id year

// clear timers
timer clear

*----- original -----

timer on 1

gen amount_new_gr = amount_new / L.amount_new - 1

clonevar amount_old2 = amount_old

quietly forv i = 1/`=_N' {
    if missing(amount_old2[`i']) {
        replace amount_old2 = amount_old2[`=`i'-1'] * (1 + amount_new_gr[`i']) in `i'
    }
}

timer off 1

*----- alternative -----

timer on 2

gen growth = amount_new / L.amount_new

clonevar amount_old3 = amount_old

quietly bysort id : replace amount_old3 = L.amount_old3 * growth ///
    if missing(amount_old3)

timer off 2

// results
timer list

timer 命令允许我们对两个版本进行基准测试；您的原件 (1) 和建议的备选方案 (2)。时间以秒为单位：

. timer list
   1:     36.82 /        1 =      36.8180
   2:      0.83 /        1 =       0.8260

有了这个包含大约 200 万个观察值的数据集，使用替代方法时速度会大大提高。

此外，代码更简单，更易于阅读。请注意，我使用的是 if 限定符 而不是 if 命令（请参阅 the difference）。鉴于 Stata 会自动为我们执行此操作，因此无需循环观察。

另请阅读 help by，Stata 中的一个基本且非常重要的结构。

如何使用面板数据中另一个系列的增长率来推断一个系列？

How do I extrapolate a series using the growth rates in another series in panel data?

stata