如何使用面板数据中另一个系列的增长率来推断一个系列?
How do I extrapolate a series using the growth rates in another series in panel data?
我有一个庞大的(约 1.23 亿次观察)面板数据集,其中包含几对系列的数据,例如amount_old
和 amount_new
。系列 amount_new
在时间上比系列 amount_old
向前延伸得更远,所以我想使用根据 amount_new
.[=21= 计算的增长率来推断 amount_old
的值]
这里是一个小样本数据集:
clear
input str3 str_id year amount_old amount_new
aaa 2000 1105.34 1568.2
aaa 2001 1122.6268 1571.8486
aaa 2002 1132.0478 1605.832
aaa 2003 1186.9295 1666.4644
aaa 2004 1187.2502 1714.0043
aaa 2005 1230.0004 1744.4136
aaa 2006 1252.9979 1821.2219
aaa 2007 1289.5164 1855.4785
aaa 2008 1351.6705 1864.0597
aaa 2009 1353.639 1877.5152
aaa 2010 1398.2009 1916.5298
aaa 2011 . 1921.5906
aaa 2012 . 2003.8804
aaa 2013 . 2051.6525
aaa 2014 . 2072.8235
bbb 2000 7964.3029 9043.68
bbb 2001 8062.8454 9319.9098
bbb 2002 8223.277 9415.5202
bbb 2003 8605.8333 9760.014
bbb 2004 8636.8787 10024.964
bbb 2005 8927.8641 10327.588
bbb 2006 9284.91 10408.275
bbb 2007 . 10693.495
bbb 2008 . 11141.559
bbb 2009 . 11367.394
bbb 2010 . 11671.628
bbb 2011 . 11994.248
ccc 1990 20593.59 31049.493
ccc 1991 20723.578 31364.674
ccc 1992 21119.377 32870.953
ccc 1993 . 33138.507
ccc 1994 . 33383.829
ccc 1995 . 33776.957
ccc 1996 . 33966.004
ccc 1997 . 34324.091
ccc 1998 . 35744.175
end
加载数据后,我可以通过遍历每个观察来推断:
encode str_id, gen(id)
xtset id year
gen amount_new_gr = amount_new / L.amount_new - 1
forv i = 1/`=_N' {
if missing(amount_old[`i']) {
replace amount_old = amount_old[`=`i'-1'] * (1 + amount_new_gr[`i']) in `i'
}
}
但这相当慢,而且数据集很大,我需要对大约 45 对系列(series1_old
、series1_new
、series2_old
等)执行此操作.).
有没有办法在 Stata 13 中使用滞后运算符或面板数据集的某些其他特征来做到这一点?
假设您真的想这样做(从统计上讲这可能不是您的最佳选择),请尝试代码中提供的替代方案:
clear
set more off
*----- exmple data -----
input str3 str_id year amount_old amount_new
aaa 2000 1105.34 1568.2
aaa 2001 1122.6268 1571.8486
aaa 2002 1132.0478 1605.832
aaa 2003 1186.9295 1666.4644
aaa 2004 1187.2502 1714.0043
aaa 2005 1230.0004 1744.4136
aaa 2006 1252.9979 1821.2219
aaa 2007 1289.5164 1855.4785
aaa 2008 1351.6705 1864.0597
aaa 2009 1353.639 1877.5152
aaa 2010 1398.2009 1916.5298
aaa 2011 . 1921.5906
aaa 2012 . 2003.8804
aaa 2013 . 2051.6525
aaa 2014 . 2072.8235
bbb 2000 7964.3029 9043.68
bbb 2001 8062.8454 9319.9098
bbb 2002 8223.277 9415.5202
bbb 2003 8605.8333 9760.014
bbb 2004 8636.8787 10024.964
bbb 2005 8927.8641 10327.588
bbb 2006 9284.91 10408.275
bbb 2007 . 10693.495
bbb 2008 . 11141.559
bbb 2009 . 11367.394
bbb 2010 . 11671.628
bbb 2011 . 11994.248
ccc 1990 20593.59 31049.493
ccc 1991 20723.578 31364.674
ccc 1992 21119.377 32870.953
ccc 1993 . 33138.507
ccc 1994 . 33383.829
ccc 1995 . 33776.957
ccc 1996 . 33966.004
ccc 1997 . 34324.091
ccc 1998 . 35744.175
end
// create more observations
expand 60000
bysort str_id year : gen idpre = _n
egen id = group(idpre str_id)
order id
drop str_id idpre
// xtset the data
xtset id year
// clear timers
timer clear
*----- original -----
timer on 1
gen amount_new_gr = amount_new / L.amount_new - 1
clonevar amount_old2 = amount_old
quietly forv i = 1/`=_N' {
if missing(amount_old2[`i']) {
replace amount_old2 = amount_old2[`=`i'-1'] * (1 + amount_new_gr[`i']) in `i'
}
}
timer off 1
*----- alternative -----
timer on 2
gen growth = amount_new / L.amount_new
clonevar amount_old3 = amount_old
quietly bysort id : replace amount_old3 = L.amount_old3 * growth ///
if missing(amount_old3)
timer off 2
// results
timer list
timer
命令允许我们对两个版本进行基准测试;您的原件 (1) 和建议的备选方案 (2)。时间以秒为单位:
. timer list
1: 36.82 / 1 = 36.8180
2: 0.83 / 1 = 0.8260
有了这个包含大约 200 万个观察值的数据集,使用替代方法时速度会大大提高。
此外,代码更简单,更易于阅读。请注意,我使用的是 if
限定符 而不是 if
命令 (请参阅 the difference)。鉴于 Stata 会自动为我们执行此操作,因此无需循环观察。
另请阅读 help by
,Stata 中的一个基本且非常重要的结构。
我有一个庞大的(约 1.23 亿次观察)面板数据集,其中包含几对系列的数据,例如amount_old
和 amount_new
。系列 amount_new
在时间上比系列 amount_old
向前延伸得更远,所以我想使用根据 amount_new
.[=21= 计算的增长率来推断 amount_old
的值]
这里是一个小样本数据集:
clear
input str3 str_id year amount_old amount_new
aaa 2000 1105.34 1568.2
aaa 2001 1122.6268 1571.8486
aaa 2002 1132.0478 1605.832
aaa 2003 1186.9295 1666.4644
aaa 2004 1187.2502 1714.0043
aaa 2005 1230.0004 1744.4136
aaa 2006 1252.9979 1821.2219
aaa 2007 1289.5164 1855.4785
aaa 2008 1351.6705 1864.0597
aaa 2009 1353.639 1877.5152
aaa 2010 1398.2009 1916.5298
aaa 2011 . 1921.5906
aaa 2012 . 2003.8804
aaa 2013 . 2051.6525
aaa 2014 . 2072.8235
bbb 2000 7964.3029 9043.68
bbb 2001 8062.8454 9319.9098
bbb 2002 8223.277 9415.5202
bbb 2003 8605.8333 9760.014
bbb 2004 8636.8787 10024.964
bbb 2005 8927.8641 10327.588
bbb 2006 9284.91 10408.275
bbb 2007 . 10693.495
bbb 2008 . 11141.559
bbb 2009 . 11367.394
bbb 2010 . 11671.628
bbb 2011 . 11994.248
ccc 1990 20593.59 31049.493
ccc 1991 20723.578 31364.674
ccc 1992 21119.377 32870.953
ccc 1993 . 33138.507
ccc 1994 . 33383.829
ccc 1995 . 33776.957
ccc 1996 . 33966.004
ccc 1997 . 34324.091
ccc 1998 . 35744.175
end
加载数据后,我可以通过遍历每个观察来推断:
encode str_id, gen(id)
xtset id year
gen amount_new_gr = amount_new / L.amount_new - 1
forv i = 1/`=_N' {
if missing(amount_old[`i']) {
replace amount_old = amount_old[`=`i'-1'] * (1 + amount_new_gr[`i']) in `i'
}
}
但这相当慢,而且数据集很大,我需要对大约 45 对系列(series1_old
、series1_new
、series2_old
等)执行此操作.).
有没有办法在 Stata 13 中使用滞后运算符或面板数据集的某些其他特征来做到这一点?
假设您真的想这样做(从统计上讲这可能不是您的最佳选择),请尝试代码中提供的替代方案:
clear
set more off
*----- exmple data -----
input str3 str_id year amount_old amount_new
aaa 2000 1105.34 1568.2
aaa 2001 1122.6268 1571.8486
aaa 2002 1132.0478 1605.832
aaa 2003 1186.9295 1666.4644
aaa 2004 1187.2502 1714.0043
aaa 2005 1230.0004 1744.4136
aaa 2006 1252.9979 1821.2219
aaa 2007 1289.5164 1855.4785
aaa 2008 1351.6705 1864.0597
aaa 2009 1353.639 1877.5152
aaa 2010 1398.2009 1916.5298
aaa 2011 . 1921.5906
aaa 2012 . 2003.8804
aaa 2013 . 2051.6525
aaa 2014 . 2072.8235
bbb 2000 7964.3029 9043.68
bbb 2001 8062.8454 9319.9098
bbb 2002 8223.277 9415.5202
bbb 2003 8605.8333 9760.014
bbb 2004 8636.8787 10024.964
bbb 2005 8927.8641 10327.588
bbb 2006 9284.91 10408.275
bbb 2007 . 10693.495
bbb 2008 . 11141.559
bbb 2009 . 11367.394
bbb 2010 . 11671.628
bbb 2011 . 11994.248
ccc 1990 20593.59 31049.493
ccc 1991 20723.578 31364.674
ccc 1992 21119.377 32870.953
ccc 1993 . 33138.507
ccc 1994 . 33383.829
ccc 1995 . 33776.957
ccc 1996 . 33966.004
ccc 1997 . 34324.091
ccc 1998 . 35744.175
end
// create more observations
expand 60000
bysort str_id year : gen idpre = _n
egen id = group(idpre str_id)
order id
drop str_id idpre
// xtset the data
xtset id year
// clear timers
timer clear
*----- original -----
timer on 1
gen amount_new_gr = amount_new / L.amount_new - 1
clonevar amount_old2 = amount_old
quietly forv i = 1/`=_N' {
if missing(amount_old2[`i']) {
replace amount_old2 = amount_old2[`=`i'-1'] * (1 + amount_new_gr[`i']) in `i'
}
}
timer off 1
*----- alternative -----
timer on 2
gen growth = amount_new / L.amount_new
clonevar amount_old3 = amount_old
quietly bysort id : replace amount_old3 = L.amount_old3 * growth ///
if missing(amount_old3)
timer off 2
// results
timer list
timer
命令允许我们对两个版本进行基准测试;您的原件 (1) 和建议的备选方案 (2)。时间以秒为单位:
. timer list
1: 36.82 / 1 = 36.8180
2: 0.83 / 1 = 0.8260
有了这个包含大约 200 万个观察值的数据集,使用替代方法时速度会大大提高。
此外,代码更简单,更易于阅读。请注意,我使用的是 if
限定符 而不是 if
命令 (请参阅 the difference)。鉴于 Stata 会自动为我们执行此操作,因此无需循环观察。
另请阅读 help by
,Stata 中的一个基本且非常重要的结构。