为什么 python 中的 PanelOLS 与 R 中的 plm 有不同的结果

Question

我在 Python 中使用 PanelOLS 做一个固定效应模型，我在 R 中使用 plm 来验证我的结果，令我惊讶的是，这两者的系数和 P 值不同，即使他们应该是一样的？

数据来自R的数据集

library(AER)
data(Fatalities)
# define the fatality rate
Fatalities$fatal_rate <- Fatalities$fatal / Fatalities$pop * 10000
# mandadory jail or community service
Fatalities$punish <- with(Fatalities, factor(jail == "yes" | service == "yes", 
                                             labels=c("no", "yes")))

下面是我的 Python 代码 PanelOLS

w1=Fatalities.set_index(["state", "year"])
mod = PanelOLS(w1['fatal_rate'], w1[['beertax','drinkage','punish', 'miles' ,
                                     'unemp','income']], entity_effects=True)
result=mod.fit(cov_type='clustered', cluster_entity=True)
display(result.summary)

下面是我的 plm

的 R 代码

fatalities_mod6 <- plm(fatal_rate ~ beertax + year + drinkage + punish + miles +
                         unemp + log(income), index=c("state", "year"), 
                       model="within", effect="twoways", data=Fatalities)

我也想问一下PanelOLS中的cov_type，据我了解，如果我想有稳健的标准误差和P值，我应该使用cov_type=’robust’ ，而不是 ‘clustered’。但是我看到许多使用 ‘clustered’ 的固定效应示例——我应该使用哪个来获得变量的正确标准误差和 p 值？

输出python`panelOLS`是：

R `plm` 的输出：

Answer 1

两个问题，1. 您在 plm 公式中使用了 year 变量，这是多余的，因为它已经 indexed，以及 2. 您的 Python PanelOLS 代码计算到目前为止的单个固定效应，我可以使用 effect="individual".

复制 Python 估计和 plm

library(plm)
fatalities_mod6 <- plm(fatal_rate ~ beertax + drinkage + punish + miles + 
                         unemp + log(income), index=c("state", "year"), 
                       model="within", effect="individual", data=Fatalities)

此外 Python 的 PanelOLS 似乎使用标准误差聚集在状态上，应用 Arellano 方法使用异方差一致性标准误差类型 1 ("HC1")。

round(summary(fatalities_mod6, 
        vcov=vcovHC.plm(fatalities_mod6, cluster="group", type="HC1",
                        method="arellano"))$coe, 4)
#             Estimate Std. Error t-value Pr(>|t|)
# beertax      -0.3664     0.2920 -1.2550   0.2105
# drinkage     -0.0378     0.0252 -1.4969   0.1355
# punishyes    -0.0342     0.0951 -0.3598   0.7193
# miles         0.0000     0.0000 -0.4217   0.6736
# unemp        -0.0196     0.0128 -1.5385   0.1250
# log(income)   0.6765     0.5424  1.2472   0.2134

这类似于 Python 的 PanelOLS 结果。

编辑

对于具有聚类稳健标准误差的数据的双向固定效应估计量，代码为，

对于 Python:

mod = PanelOLS(w1['fatal_rate'], w1[['beertax','drinkage','punish', 'miles' ,
                                     'unemp','income']],
               entity_effects=True, time_effects=True)

对于 R：

fatalities_mod6 <- plm(fatal_rate ~ beertax + drinkage + punish + miles + 
                             unemp + log(income), index=c("state", "year"), 
                           model="within", effect="twoways", data=Fatalities)
summary(fatalities_mod6, 
        vcov=vcovHC.plm(fatalities_mod6, cluster="group", type="HC1"))

编辑 2

下面是 Python、R 和 Stata 的比较，使用 Python 中 statsmodels 附带的 grunfeld 数据（它们与 Python 中的数据略有不同） data(Grunfeld, package="plm")).

PanelOLS (Python), plm (R), and xtreg with vce(cluster *clustvar*) (Stata) 似乎应用略有不同的方法来计算集群鲁棒标准误差（有关详细信息，请参阅链接文档）。

Python:

from statsmodels.datasets import grunfeld
data = grunfeld.load_pandas().data
data = data.set_index(['firm','year'])

import pandas as pd
data.to_csv("grunfeld.csv")  ## export data for R / Stata

from linearmodels import PanelOLS
mod = PanelOLS.from_formula('invest ~ value + capital + EntityEffects + TimeEffects', 
                            data=data)
print(mod.fit())
#             Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
# ------------------------------------------------------------------------------
# value          0.1167     0.0129     9.0219     0.0000      0.0912      0.1422
# capital        0.3514     0.0210     16.696     0.0000      0.3099      0.3930

print(mod.fit(cov_type='robust'))
# value          0.1167     0.0191     6.1087     0.0000      0.0790      0.1544
# capital        0.3514     0.0529     6.6472     0.0000      0.2471      0.4557

print(mod.fit(cov_type='clustered', cluster_entity=True))
# value          0.1167     0.0113     10.368     0.0000      0.0945      0.1389
# capital        0.3514     0.0470     7.4836     0.0000      0.2588      0.4441

print(mod.fit(cov_type='clustered', cluster_entity=True, cluster_time=True))
# value          0.1167     0.0117     10.015     0.0000      0.0937      0.1397
# capital        0.3514     0.0447     7.8622     0.0000      0.2633      0.4396

R:

grunfeld <- read.csv("V:/Python/fatalities/grunfeld.csv")

library(plm)
fit <- plm(invest ~ value + capital, grunfeld, effect="twoways", model="within",
           index=c("firm", "year"))

## resembles exactly py mod.fit()
round(summary(fit)$coe, 4)  
#         Estimate Std. Error t-value Pr(>|t|)
# value     0.1167     0.0129  9.0219        0
# capital   0.3514     0.0210 16.6964        0

## resembles approximately py mod.fit('cluster', cluster_entity=True) and Stata (below)
round(summary(fit, cluster="group", 
              vcov=vcovHC.plm(fit, method="arellano", type="HC2")
)$coe, 4) 
#         Estimate Std. Error t-value Pr(>|t|)
# value     0.1167     0.0114 10.2042        0
# capital   0.3514     0.0507  6.9364        0

## doesn't seem to change with two-way clustering
round(summary(fit, cluster=c("group", "time"),
              vcov=vcovHC.plm(fit, method="arellano", type="HC2")
)$coe, 4) 
#         Estimate Std. Error t-value Pr(>|t|)
# value     0.1167     0.0114 10.2042        0
# capital   0.3514     0.0507  6.9364        0

fit.1 <- lm(invest ~ value + capital + factor(firm) + factor(year) + 0, grunfeld)
library(lmtest)
## resembles exactly py mod.fit(cov_type='robust')
round(coeftest(fit.1, vcov.=sandwich::vcovHC(fit.1, type="HC1"))[1:2,], 4)
#         Estimate Std. Error t value Pr(>|t|)
# value     0.1167     0.0191  6.1087        0
# capital   0.3514     0.0529  6.6472        0

Stata:

import delim grunfeld.csv, clear
egen firmn = group(firm)
xtset firmn year

xtreg invest value capital i.year, fe vce(cluster firmn) 
#         |               Robust
#  invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
# -------------+----------------------------------------------------------------
#   value |   .1166811   .0114755    10.17   0.000     .0911122    .1422501
# capital |   .3514357   .0478837     7.34   0.000     .2447441    .4581273

为什么 python 中的 PanelOLS 与 R 中的 plm 有不同的结果

Why PanelOLS in python has different result than plm in R

python

r

panel-data

plm

输出python`panelOLS`是：

R `plm` 的输出：

编辑

编辑 2

为什么 python 中的 PanelOLS 与 R 中的 plm 有不同的结果

Why PanelOLS in python has different result than plm in R

python

r

panel-data

plm

输出pythonpanelOLS是：

R plm 的输出：

编辑

编辑 2

输出python`panelOLS`是：

R `plm` 的输出：