Python 的 linearmodels.PanelOLS 和 Stata 的 xtreg, fe 在使用稳健标准误差时的标准误差差异
Difference in Standard Errors Between Python’s linearmodels.PanelOLS and Stata‘s xtreg, fe when Using Robust Standard Errors
我从线性模型 PanelOLS 的介绍中复制了一个 example,并包含了稳健的标准错误来学习如何使用该模块。这是我使用的代码
from linearmodels.datasets import jobtraining
import statsmodels.api as sm2
data = jobtraining.load()
mi_data = data.set_index(['fcode', 'year'])
mi_data.head()
from linearmodels import PanelOLS
mod = PanelOLS(mi_data.lscrap, sm2.add_constant(mi_data.hrsemp), entity_effects=True)
print(mod.fit(cov_type='robust'))
PanelOLS Estimation Summary
================================================================================
Dep. Variable: lscrap R-squared: 0.0528
Estimator: PanelOLS R-squared (Between): -0.0029
No. Observations: 140 R-squared (Within): 0.0528
Date: Tue, May 05 2020 R-squared (Overall): 0.0048
Time: 10:49:58 Log-likelihood -90.459
Cov. Estimator: Robust
F-statistic: 5.0751
Entities: 48 P-value 0.0267
Avg Obs: 2.9167 Distribution: F(1,91)
Min Obs: 1.0000
Max Obs: 3.0000 F-statistic (robust): 8.2299
P-value 0.0051
Time periods: 3 Distribution: F(1,91)
Avg Obs: 46.667
Min Obs: 46.000
Max Obs: 48.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 0.4982 0.0555 8.9714 0.0000 0.3879 0.6085
hrsemp -0.0054 0.0019 -2.8688 0.0051 -0.0092 -0.0017
==============================================================================
F-test for Poolability: 17.094
P-value: 0.0000
Distribution: F(47,91)
Included effects: Entity
当我将结果与我过去使用稳健标准误差执行固定效应回归的方式进行比较时,我发现标准误差非常不同。
xtset fcode year
xtreg lscrap hrsemp , fe vce(robust)
Fixed-effects (within) regression Number of obs = 140
Group variable: fcode Number of groups = 48
R-sq: within = 0.0528 Obs per group: min = 1
between = 0.0002 avg = 2.9
overall = 0.0055 max = 3
F(1,47) = 7.93
corr(u_i, Xb) = -0.0266 Prob > F = 0.0071
(Std. Err. adjusted for 48 clusters in fcode)
------------------------------------------------------------------------------
| Robust
lscrap | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hrsemp | -.0054186 .0019243 -2.82 0.007 -.0092897 -.0015474
_cons | .4981764 .0295415 16.86 0.000 .4387464 .5576063
-------------+----------------------------------------------------------------
sigma_u | 1.4004191
sigma_e | .57268937
rho | .85672692 (fraction of variance due to u_i)
------------------------------------------------------------------------------
我不明白差异从何而来,因为没有强大的 SE,结果(几乎)相同。我如何使用 Python 像在 Stata 中一样使用强大的 SE linearmodels.PanelOLS?
在 Python 中使用 cov_type='robust'
选项的怀特稳健协方差对于固定效应模型不稳健。您应该改用 cov_type='robust',cluster_entity=True
。这是线性模型中相应的 manual entry。
完整代码:
from linearmodels.datasets import jobtraining
import statsmodels.api as sm2
data = jobtraining.load()
mi_data = data.set_index(['fcode', 'year'])
mi_data.head()
from linearmodels import PanelOLS
mod = PanelOLS(mi_data.lscrap, sm2.add_constant(mi_data.hrsemp), entity_effects=True)
print(mod.fit(cov_type='robust',cluster_entity=True))
并且相应的输出与 Stata 的输出几乎相似:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: lscrap R-squared: 0.0528
Estimator: PanelOLS R-squared (Between): -0.0029
No. Observations: 140 R-squared (Within): 0.0528
Date: Tue, May 05 2020 R-squared (Overall): 0.0048
Time: 18:53:06 Log-likelihood -90.459
Cov. Estimator: Robust
F-statistic: 5.0751
Entities: 48 P-value 0.0267
Avg Obs: 2.9167 Distribution: F(1,91)
Min Obs: 1.0000
Max Obs: 3.0000 F-statistic (robust): 8.2299
P-value 0.0051
Time periods: 3 Distribution: F(1,91)
Avg Obs: 46.667
Min Obs: 46.000
Max Obs: 48.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 0.4982 0.0555 8.9714 0.0000 0.3879 0.6085
hrsemp -0.0054 0.0019 -2.8688 0.0051 -0.0092 -0.0017
==============================================================================
F-test for Poolability: 17.094
P-value: 0.0000
Distribution: F(47,91)
Included effects: Entity
我从线性模型 PanelOLS 的介绍中复制了一个 example,并包含了稳健的标准错误来学习如何使用该模块。这是我使用的代码
from linearmodels.datasets import jobtraining
import statsmodels.api as sm2
data = jobtraining.load()
mi_data = data.set_index(['fcode', 'year'])
mi_data.head()
from linearmodels import PanelOLS
mod = PanelOLS(mi_data.lscrap, sm2.add_constant(mi_data.hrsemp), entity_effects=True)
print(mod.fit(cov_type='robust'))
PanelOLS Estimation Summary
================================================================================
Dep. Variable: lscrap R-squared: 0.0528
Estimator: PanelOLS R-squared (Between): -0.0029
No. Observations: 140 R-squared (Within): 0.0528
Date: Tue, May 05 2020 R-squared (Overall): 0.0048
Time: 10:49:58 Log-likelihood -90.459
Cov. Estimator: Robust
F-statistic: 5.0751
Entities: 48 P-value 0.0267
Avg Obs: 2.9167 Distribution: F(1,91)
Min Obs: 1.0000
Max Obs: 3.0000 F-statistic (robust): 8.2299
P-value 0.0051
Time periods: 3 Distribution: F(1,91)
Avg Obs: 46.667
Min Obs: 46.000
Max Obs: 48.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 0.4982 0.0555 8.9714 0.0000 0.3879 0.6085
hrsemp -0.0054 0.0019 -2.8688 0.0051 -0.0092 -0.0017
==============================================================================
F-test for Poolability: 17.094
P-value: 0.0000
Distribution: F(47,91)
Included effects: Entity
当我将结果与我过去使用稳健标准误差执行固定效应回归的方式进行比较时,我发现标准误差非常不同。
xtset fcode year
xtreg lscrap hrsemp , fe vce(robust)
Fixed-effects (within) regression Number of obs = 140
Group variable: fcode Number of groups = 48
R-sq: within = 0.0528 Obs per group: min = 1
between = 0.0002 avg = 2.9
overall = 0.0055 max = 3
F(1,47) = 7.93
corr(u_i, Xb) = -0.0266 Prob > F = 0.0071
(Std. Err. adjusted for 48 clusters in fcode)
------------------------------------------------------------------------------
| Robust
lscrap | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hrsemp | -.0054186 .0019243 -2.82 0.007 -.0092897 -.0015474
_cons | .4981764 .0295415 16.86 0.000 .4387464 .5576063
-------------+----------------------------------------------------------------
sigma_u | 1.4004191
sigma_e | .57268937
rho | .85672692 (fraction of variance due to u_i)
------------------------------------------------------------------------------
我不明白差异从何而来,因为没有强大的 SE,结果(几乎)相同。我如何使用 Python 像在 Stata 中一样使用强大的 SE linearmodels.PanelOLS?
在 Python 中使用 cov_type='robust'
选项的怀特稳健协方差对于固定效应模型不稳健。您应该改用 cov_type='robust',cluster_entity=True
。这是线性模型中相应的 manual entry。
完整代码:
from linearmodels.datasets import jobtraining
import statsmodels.api as sm2
data = jobtraining.load()
mi_data = data.set_index(['fcode', 'year'])
mi_data.head()
from linearmodels import PanelOLS
mod = PanelOLS(mi_data.lscrap, sm2.add_constant(mi_data.hrsemp), entity_effects=True)
print(mod.fit(cov_type='robust',cluster_entity=True))
并且相应的输出与 Stata 的输出几乎相似:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: lscrap R-squared: 0.0528
Estimator: PanelOLS R-squared (Between): -0.0029
No. Observations: 140 R-squared (Within): 0.0528
Date: Tue, May 05 2020 R-squared (Overall): 0.0048
Time: 18:53:06 Log-likelihood -90.459
Cov. Estimator: Robust
F-statistic: 5.0751
Entities: 48 P-value 0.0267
Avg Obs: 2.9167 Distribution: F(1,91)
Min Obs: 1.0000
Max Obs: 3.0000 F-statistic (robust): 8.2299
P-value 0.0051
Time periods: 3 Distribution: F(1,91)
Avg Obs: 46.667
Min Obs: 46.000
Max Obs: 48.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 0.4982 0.0555 8.9714 0.0000 0.3879 0.6085
hrsemp -0.0054 0.0019 -2.8688 0.0051 -0.0092 -0.0017
==============================================================================
F-test for Poolability: 17.094
P-value: 0.0000
Distribution: F(47,91)
Included effects: Entity