在 Stata 中看似无关的回归太慢后的边距
Margins after seemingly unrelated regression too slow in Stata
我有一个300万的obs数据集。我需要用 SUR 估算 LPM,并获得边际效应。
我用了gsem... vce(cluster x)
,然后margins, ... force
。但是需要很长时间才能得到利润率结果(超过 2 小时)。我确实需要 CI 的标准错误,所以我不能不使用 nose
选项。
还有其他方法可以提高速度吗?
确切的代码取决于您确切指的是哪种边际效应。您可以使用 lincom
计算部分效应,这很可能比 margins
.
更快
举个例子,假设我们估计这个模型:
x1对y的偏影响可以通过对x1求偏导得到:
我们可以通过插上means得到x1对y在x2和x3的means上的效果。要在 Stata 中执行此操作:
// Get data
webuse regress
// Run the regression
qui reg y c.x1##c.(x2 x3)
// Get the sample means of x2 and x3
sum x2 if e(sample), meanonly
scalar m_x2 = r(mean)
sum x3 if e(sample), meanonly
scalar m_x3 = r(mean)
// Calculate partial effect
lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
结果:
. lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
( 1) x1 - .2972973*c.x1#c.x2 + 3019.459*c.x1#c.x3 = 0
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1.409372 1.005254 1.40 0.163 -.5778255 3.396569
------------------------------------------------------------------------------
可以看到,这和margins得到的结果是一样的:
. qui reg y c.x1##c.(x2 x3)
. margins, dydx(x1) atmeans
Conditional marginal effects Number of obs = 148
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : x1
at : x1 = 3.014865 (mean)
x2 = -.2972973 (mean)
x3 = 3019.459 (mean)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.409372 1.005254 1.40 0.163 -.5778256 3.396569
------------------------------------------------------------------------------
这是一个速度比较,显示在这种情况下,lincom
比 margins
快 14 倍,有 300 万次观察:
clear
webuse regress
expand 20271
gen lincom = .
gen margins = .
qui reg y c.x1##c.(x2 x3)
forval i = 1/50 {
timer clear
timer on 1
sum x2 if e(sample), meanonly
scalar m_x2 = r(mean)
sum x3 if e(sample), meanonly
scalar m_x3 = r(mean)
lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
timer off 1
timer on 2
margins, dydx(x1) atmeans
timer off 2
timer list
replace lincom = r(t1) in `i'
replace margins = r(t2) in `i'
}
ttest lincom == margins
di "On average, lincom is " %4.2f `=r(mu_2) / r(mu_1)' " times faster than margins with `=_N' observations"
// On average, lincom is 13.88 times faster than margins with 3000108 observations
我有一个300万的obs数据集。我需要用 SUR 估算 LPM,并获得边际效应。
我用了gsem... vce(cluster x)
,然后margins, ... force
。但是需要很长时间才能得到利润率结果(超过 2 小时)。我确实需要 CI 的标准错误,所以我不能不使用 nose
选项。
还有其他方法可以提高速度吗?
确切的代码取决于您确切指的是哪种边际效应。您可以使用 lincom
计算部分效应,这很可能比 margins
.
举个例子,假设我们估计这个模型:
x1对y的偏影响可以通过对x1求偏导得到:
我们可以通过插上means得到x1对y在x2和x3的means上的效果。要在 Stata 中执行此操作:
// Get data
webuse regress
// Run the regression
qui reg y c.x1##c.(x2 x3)
// Get the sample means of x2 and x3
sum x2 if e(sample), meanonly
scalar m_x2 = r(mean)
sum x3 if e(sample), meanonly
scalar m_x3 = r(mean)
// Calculate partial effect
lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
结果:
. lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
( 1) x1 - .2972973*c.x1#c.x2 + 3019.459*c.x1#c.x3 = 0
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1.409372 1.005254 1.40 0.163 -.5778255 3.396569
------------------------------------------------------------------------------
可以看到,这和margins得到的结果是一样的:
. qui reg y c.x1##c.(x2 x3)
. margins, dydx(x1) atmeans
Conditional marginal effects Number of obs = 148
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : x1
at : x1 = 3.014865 (mean)
x2 = -.2972973 (mean)
x3 = 3019.459 (mean)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.409372 1.005254 1.40 0.163 -.5778256 3.396569
------------------------------------------------------------------------------
这是一个速度比较,显示在这种情况下,lincom
比 margins
快 14 倍,有 300 万次观察:
clear
webuse regress
expand 20271
gen lincom = .
gen margins = .
qui reg y c.x1##c.(x2 x3)
forval i = 1/50 {
timer clear
timer on 1
sum x2 if e(sample), meanonly
scalar m_x2 = r(mean)
sum x3 if e(sample), meanonly
scalar m_x3 = r(mean)
lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
timer off 1
timer on 2
margins, dydx(x1) atmeans
timer off 2
timer list
replace lincom = r(t1) in `i'
replace margins = r(t2) in `i'
}
ttest lincom == margins
di "On average, lincom is " %4.2f `=r(mu_2) / r(mu_1)' " times faster than margins with `=_N' observations"
// On average, lincom is 13.88 times faster than margins with 3000108 observations