plm:固定效应回归 - 索引/ ID 顺序

plm: Fixed Effects Regression - Index / ID order

我是 运行 使用 plm 包的固定效应回归。 ID 代码的顺序为什么以及如何影响回归?

我将这些代码用于 运行 回归,它们仅在 ID 代码 CompanyYear 的顺序上有所不同。

代码:

MV_Year <- plm (MVlog ~ LEV + Size + DY + RDlog
                , data=Values, model="within", index= c("Year","Company"))


MV_Company <- plm (MVlog ~ LEV + Size + DY + RDlog,
                   data=Values, model="within", index= c("Company", "Year"))

对应的输出: MV_Year:

Oneway (individual) effect Within Model

Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values, 
    model = "within", index = c("Year", "Company"))

Unbalanced Panel: n = 17, T = 557-4280, N = 29890

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-5.250901 -0.457100  0.015763  0.476140  6.006483 

Coefficients:
         Estimate  Std. Error t-value Pr(>|t|)    
LEV   -1.95485031  0.04060539 -48.143  < 2e-16 ***
Size   0.75233709  0.00314849 238.952  < 2e-16 ***
DY    -0.00033192  0.00013482  -2.462  0.01382 *  
RDlog  0.13148626  0.00300509  43.755  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    102610
Residual Sum of Squares: 17568
R-Squared:      0.82879
Adj. R-Squared: 0.82868
F-statistic: 36148 on 4 and 29869 DF, p-value: < 2.22e-16

MV_Company

Oneway (individual) effect Within Model

Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values, 
    model = "within", index = c("Company", "Year"))

Unbalanced Panel: n = 5911, T = 1-17, N = 29890

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-4.35967 -0.38711  0.00000  0.40528  5.48624 

Coefficients:
         Estimate  Std. Error  t-value Pr(>|t|)    
LEV   -1.88958140  0.04392991 -43.0135  < 2e-16 ***
Size   0.74650676  0.00375926 198.5782  < 2e-16 ***
DY    -0.00034308  0.00014585  -2.3524  0.01866 *  
RDlog  0.13904360  0.00331886  41.8950  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    58168
Residual Sum of Squares: 12747
R-Squared:      0.78085
Adj. R-Squared: 0.72679
F-statistic: 21356.2 on 4 and 23975 DF, p-value: < 2.22e-16

为什么输出在不同的估计和 R^2 之间有这些小的差异?

index= 选项的原因是 plm() 内部使用 pdata.frame(),它期望第一列是 "id",第二列是 "time" 如果相应的名称未由 index=(<id>, <time>)

指定

?pdata.frame我们可以读到:

The index argument indicates the dimensions of the panel. It can be:

  • a vector of two character strings which contains the names of the individual and of the time indexes,
  • a character string which is the name of the individual index variable. In this case, the time index is created automatically and
    a new variable called "time" is added, assuming consecutive and
    ascending time periods in the order of the original data, ...

下面的例子将帮助我们理解这一点。首先我们加载 Grunfeld 数据,它看起来像这样。

library(plm)
data(Grunfeld)
head(Grunfeld, 3)
#   firm year   inv  value capital
# 1    1 1935 317.6 3078.5     2.8
# 2    1 1936 391.8 4661.7    52.6
# 3    1 1937 410.6 5387.1   156.9

第一列是ID,第二列是时间。让我们估计一个模型。

summary(plm(inv ~ value + capital, data=Grunfeld,
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

现在,当我们混淆第一列和第二列时,

summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
            model="within"))$coe
#          Estimate  Std. Error   t-value     Pr(>|t|)
# value   0.1167978 0.006331302 18.447672 3.586220e-43
# capital 0.2197066 0.032296107  6.802881 1.503653e-10

结果不同。但是当我们通过 index=(<id>, <time>) 告诉 plm 使用哪些列时,

summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)], 
            index=c("firm", "year"),
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

我们得到旧结果。如果我们完全混淆这些列,

summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
            model="within"))$coe
# Error 

plm() 确实很困惑 :) 但是和以前一样,当我们帮助 plm() 时,它的行为符合预期并再次产生 right 结果。

summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)], 
            index=c("firm", "year"),
            model="within"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

注意,您实际上只是在计算公司固定效应。如果您打算计算具有公司和年份固定效应的模型,让我们将其作为 LSDV 模型来计算,

summary(lm(inv ~ value + capital + factor(firm) + factor(year) - 1, Grunfeld))$coe[1:2, ]
#          Estimate Std. Error   t value     Pr(>|t|)
# value   0.1177159 0.01375128  8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35

我们看到值与上面不同,因为 plms 到目前为止只包括公司固定效应,请参阅:

summary(lm(inv ~ value + capital + factor(firm) - 1, Grunfeld))$coe[1:2, ]
#          Estimate Std. Error   t value     Pr(>|t|)
# value   0.1101238 0.01185669  9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42

为了正确,我们还需要指定 effect="twoways" 以获得公司和年份固定效应。

summary(plm(inv ~ value + capital, data=Grunfeld,
            index=c("firm", "year"),
            model="within", effect="twoways"))$coe
#          Estimate Std. Error   t-value     Pr(>|t|)
# value   0.1177159 0.01375128  8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35