plm:固定效应回归 - 索引/ ID 顺序
plm: Fixed Effects Regression - Index / ID order
我是 运行 使用 plm
包的固定效应回归。
ID 代码的顺序为什么以及如何影响回归?
我将这些代码用于 运行 回归,它们仅在 ID 代码 Company
和 Year
的顺序上有所不同。
代码:
MV_Year <- plm (MVlog ~ LEV + Size + DY + RDlog
, data=Values, model="within", index= c("Year","Company"))
MV_Company <- plm (MVlog ~ LEV + Size + DY + RDlog,
data=Values, model="within", index= c("Company", "Year"))
对应的输出:
MV_Year:
Oneway (individual) effect Within Model
Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values,
model = "within", index = c("Year", "Company"))
Unbalanced Panel: n = 17, T = 557-4280, N = 29890
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-5.250901 -0.457100 0.015763 0.476140 6.006483
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
LEV -1.95485031 0.04060539 -48.143 < 2e-16 ***
Size 0.75233709 0.00314849 238.952 < 2e-16 ***
DY -0.00033192 0.00013482 -2.462 0.01382 *
RDlog 0.13148626 0.00300509 43.755 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 102610
Residual Sum of Squares: 17568
R-Squared: 0.82879
Adj. R-Squared: 0.82868
F-statistic: 36148 on 4 and 29869 DF, p-value: < 2.22e-16
MV_Company
Oneway (individual) effect Within Model
Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values,
model = "within", index = c("Company", "Year"))
Unbalanced Panel: n = 5911, T = 1-17, N = 29890
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-4.35967 -0.38711 0.00000 0.40528 5.48624
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
LEV -1.88958140 0.04392991 -43.0135 < 2e-16 ***
Size 0.74650676 0.00375926 198.5782 < 2e-16 ***
DY -0.00034308 0.00014585 -2.3524 0.01866 *
RDlog 0.13904360 0.00331886 41.8950 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 58168
Residual Sum of Squares: 12747
R-Squared: 0.78085
Adj. R-Squared: 0.72679
F-statistic: 21356.2 on 4 and 23975 DF, p-value: < 2.22e-16
为什么输出在不同的估计和 R^2 之间有这些小的差异?
index=
选项的原因是 plm()
内部使用 pdata.frame()
,它期望第一列是 "id"
,第二列是 "time"
如果相应的名称未由 index=(<id>, <time>)
指定
从?pdata.frame
我们可以读到:
The index argument indicates the dimensions of the panel. It can be:
- a vector of two character strings which contains the names of the individual and of the time indexes,
- a character string which is the name of the individual index variable. In this case, the time index is created automatically and
a new variable called "time" is added, assuming consecutive and
ascending time periods in the order of the original data, ...
下面的例子将帮助我们理解这一点。首先我们加载 Grunfeld
数据,它看起来像这样。
library(plm)
data(Grunfeld)
head(Grunfeld, 3)
# firm year inv value capital
# 1 1 1935 317.6 3078.5 2.8
# 2 1 1936 391.8 4661.7 52.6
# 3 1 1937 410.6 5387.1 156.9
第一列是ID,第二列是时间。让我们估计一个模型。
summary(plm(inv ~ value + capital, data=Grunfeld,
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
现在,当我们混淆第一列和第二列时,
summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1167978 0.006331302 18.447672 3.586220e-43
# capital 0.2197066 0.032296107 6.802881 1.503653e-10
结果不同。但是当我们通过 index=(<id>, <time>)
告诉 plm
使用哪些列时,
summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
index=c("firm", "year"),
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
我们得到旧结果。如果我们完全混淆这些列,
summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
model="within"))$coe
# Error
plm()
确实很困惑 :) 但是和以前一样,当我们帮助 plm()
时,它的行为符合预期并再次产生 right 结果。
summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
index=c("firm", "year"),
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
注意,您实际上只是在计算公司固定效应。如果您打算计算具有公司和年份固定效应的模型,让我们将其作为 LSDV 模型来计算,
summary(lm(inv ~ value + capital + factor(firm) + factor(year) - 1, Grunfeld))$coe[1:2, ]
# Estimate Std. Error t value Pr(>|t|)
# value 0.1177159 0.01375128 8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35
我们看到值与上面不同,因为 plm
s 到目前为止只包括公司固定效应,请参阅:
summary(lm(inv ~ value + capital + factor(firm) - 1, Grunfeld))$coe[1:2, ]
# Estimate Std. Error t value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
为了正确,我们还需要指定 effect="twoways"
以获得公司和年份固定效应。
summary(plm(inv ~ value + capital, data=Grunfeld,
index=c("firm", "year"),
model="within", effect="twoways"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1177159 0.01375128 8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35
我是 运行 使用 plm
包的固定效应回归。
ID 代码的顺序为什么以及如何影响回归?
我将这些代码用于 运行 回归,它们仅在 ID 代码 Company
和 Year
的顺序上有所不同。
代码:
MV_Year <- plm (MVlog ~ LEV + Size + DY + RDlog
, data=Values, model="within", index= c("Year","Company"))
MV_Company <- plm (MVlog ~ LEV + Size + DY + RDlog,
data=Values, model="within", index= c("Company", "Year"))
对应的输出: MV_Year:
Oneway (individual) effect Within Model
Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values,
model = "within", index = c("Year", "Company"))
Unbalanced Panel: n = 17, T = 557-4280, N = 29890
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-5.250901 -0.457100 0.015763 0.476140 6.006483
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
LEV -1.95485031 0.04060539 -48.143 < 2e-16 ***
Size 0.75233709 0.00314849 238.952 < 2e-16 ***
DY -0.00033192 0.00013482 -2.462 0.01382 *
RDlog 0.13148626 0.00300509 43.755 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 102610
Residual Sum of Squares: 17568
R-Squared: 0.82879
Adj. R-Squared: 0.82868
F-statistic: 36148 on 4 and 29869 DF, p-value: < 2.22e-16
MV_Company
Oneway (individual) effect Within Model
Call:
plm(formula = MVlog ~ LEV + Size + DY + RDlog, data = Values,
model = "within", index = c("Company", "Year"))
Unbalanced Panel: n = 5911, T = 1-17, N = 29890
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-4.35967 -0.38711 0.00000 0.40528 5.48624
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
LEV -1.88958140 0.04392991 -43.0135 < 2e-16 ***
Size 0.74650676 0.00375926 198.5782 < 2e-16 ***
DY -0.00034308 0.00014585 -2.3524 0.01866 *
RDlog 0.13904360 0.00331886 41.8950 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 58168
Residual Sum of Squares: 12747
R-Squared: 0.78085
Adj. R-Squared: 0.72679
F-statistic: 21356.2 on 4 and 23975 DF, p-value: < 2.22e-16
为什么输出在不同的估计和 R^2 之间有这些小的差异?
index=
选项的原因是 plm()
内部使用 pdata.frame()
,它期望第一列是 "id"
,第二列是 "time"
如果相应的名称未由 index=(<id>, <time>)
从?pdata.frame
我们可以读到:
The index argument indicates the dimensions of the panel. It can be:
- a vector of two character strings which contains the names of the individual and of the time indexes,
- a character string which is the name of the individual index variable. In this case, the time index is created automatically and
a new variable called "time" is added, assuming consecutive and
ascending time periods in the order of the original data, ...
下面的例子将帮助我们理解这一点。首先我们加载 Grunfeld
数据,它看起来像这样。
library(plm)
data(Grunfeld)
head(Grunfeld, 3)
# firm year inv value capital
# 1 1 1935 317.6 3078.5 2.8
# 2 1 1936 391.8 4661.7 52.6
# 3 1 1937 410.6 5387.1 156.9
第一列是ID,第二列是时间。让我们估计一个模型。
summary(plm(inv ~ value + capital, data=Grunfeld,
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
现在,当我们混淆第一列和第二列时,
summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1167978 0.006331302 18.447672 3.586220e-43
# capital 0.2197066 0.032296107 6.802881 1.503653e-10
结果不同。但是当我们通过 index=(<id>, <time>)
告诉 plm
使用哪些列时,
summary(plm(inv ~ value + capital, data=Grunfeld[c(2, 1, 3:5)],
index=c("firm", "year"),
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
我们得到旧结果。如果我们完全混淆这些列,
summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
model="within"))$coe
# Error
plm()
确实很困惑 :) 但是和以前一样,当我们帮助 plm()
时,它的行为符合预期并再次产生 right 结果。
summary(plm(inv ~ value + capital, data=Grunfeld[c(3:5, 1, 2)],
index=c("firm", "year"),
model="within"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
注意,您实际上只是在计算公司固定效应。如果您打算计算具有公司和年份固定效应的模型,让我们将其作为 LSDV 模型来计算,
summary(lm(inv ~ value + capital + factor(firm) + factor(year) - 1, Grunfeld))$coe[1:2, ]
# Estimate Std. Error t value Pr(>|t|)
# value 0.1177159 0.01375128 8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35
我们看到值与上面不同,因为 plm
s 到目前为止只包括公司固定效应,请参阅:
summary(lm(inv ~ value + capital + factor(firm) - 1, Grunfeld))$coe[1:2, ]
# Estimate Std. Error t value Pr(>|t|)
# value 0.1101238 0.01185669 9.287901 3.921108e-17
# capital 0.3100653 0.01735450 17.866564 2.220007e-42
为了正确,我们还需要指定 effect="twoways"
以获得公司和年份固定效应。
summary(plm(inv ~ value + capital, data=Grunfeld,
index=c("firm", "year"),
model="within", effect="twoways"))$coe
# Estimate Std. Error t-value Pr(>|t|)
# value 0.1177159 0.01375128 8.560354 6.652575e-15
# capital 0.3579163 0.02271901 15.754043 5.453066e-35