Python 的 linearmodels.PanelOLS 和 Stata 的 xtreg, fe 在使用稳健标准误差时的标准误差差异

Difference in Standard Errors Between Python’s linearmodels.PanelOLS and Stata‘s xtreg, fe when Using Robust Standard Errors

我从线性模型 PanelOLS 的介绍中复制了一个 example,并包含了稳健的标准错误来学习如何使用该模块。这是我使用的代码

from linearmodels.datasets import jobtraining
import statsmodels.api as sm2
data = jobtraining.load()
mi_data = data.set_index(['fcode', 'year'])
mi_data.head()
from linearmodels import PanelOLS
mod = PanelOLS(mi_data.lscrap, sm2.add_constant(mi_data.hrsemp), entity_effects=True)
print(mod.fit(cov_type='robust'))

                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                 lscrap   R-squared:                        0.0528
Estimator:                   PanelOLS   R-squared (Between):             -0.0029
No. Observations:                 140   R-squared (Within):               0.0528
Date:                Tue, May 05 2020   R-squared (Overall):              0.0048
Time:                        10:49:58   Log-likelihood                   -90.459
Cov. Estimator:                Robust                                           
                                        F-statistic:                      5.0751
Entities:                          48   P-value                           0.0267
Avg Obs:                       2.9167   Distribution:                    F(1,91)
Min Obs:                       1.0000                                           
Max Obs:                       3.0000   F-statistic (robust):             8.2299
                                        P-value                           0.0051
Time periods:                       3   Distribution:                    F(1,91)
Avg Obs:                       46.667                                           
Min Obs:                       46.000                                           
Max Obs:                       48.000                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const          0.4982     0.0555     8.9714     0.0000      0.3879      0.6085
hrsemp        -0.0054     0.0019    -2.8688     0.0051     -0.0092     -0.0017
==============================================================================

F-test for Poolability: 17.094
P-value: 0.0000
Distribution: F(47,91)

Included effects: Entity

当我将结果与我过去使用稳健标准误差执行固定效应回归的方式进行比较时,我发现标准误差非常不同。

xtset fcode year
xtreg lscrap hrsemp  , fe vce(robust)
Fixed-effects (within) regression               Number of obs      =       140
Group variable: fcode                           Number of groups   =        48

R-sq:  within  = 0.0528                         Obs per group: min =         1
       between = 0.0002                                        avg =       2.9
       overall = 0.0055                                        max =         3

                                                F(1,47)            =      7.93
corr(u_i, Xb)  = -0.0266                        Prob > F           =    0.0071

                                 (Std. Err. adjusted for 48 clusters in fcode)
------------------------------------------------------------------------------
             |               Robust
      lscrap |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      hrsemp |  -.0054186   .0019243    -2.82   0.007    -.0092897   -.0015474
       _cons |   .4981764   .0295415    16.86   0.000     .4387464    .5576063
-------------+----------------------------------------------------------------
     sigma_u |  1.4004191
     sigma_e |  .57268937
         rho |  .85672692   (fraction of variance due to u_i)
------------------------------------------------------------------------------

我不明白差异从何而来,因为没有强大的 SE,结果(几乎)相同。我如何使用 Python 像在 Stata 中一样使用强大的 SE linearmodels.PanelOLS?

在 Python 中使用 cov_type='robust' 选项的怀特稳健协方差对于固定效应模型不稳健。您应该改用 cov_type='robust',cluster_entity=True。这是线性模型中相应的 manual entry

完整代码:

from linearmodels.datasets import jobtraining
import statsmodels.api as sm2
data = jobtraining.load()
mi_data = data.set_index(['fcode', 'year'])
mi_data.head()
from linearmodels import PanelOLS
mod = PanelOLS(mi_data.lscrap, sm2.add_constant(mi_data.hrsemp), entity_effects=True)
print(mod.fit(cov_type='robust',cluster_entity=True)) 

并且相应的输出与 Stata 的输出几乎相似:

                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                 lscrap   R-squared:                        0.0528
Estimator:                   PanelOLS   R-squared (Between):             -0.0029
No. Observations:                 140   R-squared (Within):               0.0528
Date:                Tue, May 05 2020   R-squared (Overall):              0.0048
Time:                        18:53:06   Log-likelihood                   -90.459
Cov. Estimator:                Robust                                           
                                        F-statistic:                      5.0751
Entities:                          48   P-value                           0.0267
Avg Obs:                       2.9167   Distribution:                    F(1,91)
Min Obs:                       1.0000                                           
Max Obs:                       3.0000   F-statistic (robust):             8.2299
                                        P-value                           0.0051
Time periods:                       3   Distribution:                    F(1,91)
Avg Obs:                       46.667                                           
Min Obs:                       46.000                                           
Max Obs:                       48.000                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const          0.4982     0.0555     8.9714     0.0000      0.3879      0.6085
hrsemp        -0.0054     0.0019    -2.8688     0.0051     -0.0092     -0.0017
==============================================================================

F-test for Poolability: 17.094
P-value: 0.0000
Distribution: F(47,91)

Included effects: Entity