为什么我的带有贬值数据 ( lm() ) 的 fe 不能从 plm()、lfe() 和 lsdv 中再现系数?
Why does my fe with demeaned data ( lm() ) not reproduce coefficients from plm(), lfe() and lsdv?
我正在尝试使用不同的包和技术重现面板数据的固定效应系数:(1) plm()
, (2) lfe()
, (3) dummy-lsdv with lm()
,以及 (4) 贬低-fe lm()
.
我的数据集包含 1581 个观察值和 13 个变量。它是来自 527 名受访者(var = respondent)的 3 波(var = wave)调查数据。我有一个 DV (y) 和 10 个 IV(x1 到 x10)。
数据集是这样的:
respondent wave y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 2 NA NA 2 1 1.5 NA NA 2
2 1 2 NA 2 NA 0 0 0 1 4 4 1 3
3 1 3 NA 4 5 NA NA NA NA 8 NA NA 1
4 2 1 0.931 3 3 2 2 2 4 7.5 7.5 NA 3
5 2 2 0.986 4 NA NA 2 2 4.5 6.5 5 3 4
6 2 3 0.986 4 3 2 2 2 3 3 3 2 3
我的问题: 当我使用 (1) plm()
、(2) lfe()
和 (3) dummy-lsdv 执行固定效应回归时使用 lm()
,模型总是 return 相同的系数。但是,当我使用 (4) 贬值数据和 lm()
包执行固定效应回归时,我得到了不同的系数。这让我很困惑,我想知道:为什么?
这是我的代码:
1. plm()
:
输入:
library(plm)
model_plm <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
data = dataset,
index=c("respondent","wave"),
model = "within",
effect = 'individual')
summary(model_plm)
输出:
Unbalanced Panel: n = 228, T = 1-2, N = 316
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.3240866 -0.0048416 0.0000000 0.0048416 0.3240866
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
x1 -0.0216484 0.0167614 -1.2916 0.20032
x2 0.0178114 0.0141219 1.2613 0.21097
x3 -0.0145262 0.0103954 -1.3974 0.16627
x4 -0.0061660 0.0133069 -0.4634 0.64439
x5 0.0174401 0.0144256 1.2090 0.23032
x6 -0.0053556 0.0067210 -0.7968 0.42796
x7 0.0065517 0.0097627 0.6711 0.50415
x8 -0.0151375 0.0081992 -1.8462 0.06865 .
x9 0.0235351 0.0092612 2.5412 0.01303 *
x10 0.0235181 0.0228927 1.0273 0.30745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2。 lfe()
:
输入:
library(lfe)
model_lfe <- felm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 | respondent, data = dataset)
summary(model_lfe)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.021648 0.016761 -1.292 0.2003
x2 0.017811 0.014122 1.261 0.2110
x3 -0.014526 0.010395 -1.397 0.1663
x4 -0.006166 0.013307 -0.463 0.6444
x5 0.017440 0.014426 1.209 0.2303
x6 -0.005356 0.006721 -0.797 0.4280
x7 0.006552 0.009763 0.671 0.5041
x8 -0.015138 0.008199 -1.846 0.0687 .
x9 0.023535 0.009261 2.541 0.0130 *
x10 0.023518 0.022893 1.027 0.3074
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3。 LSDV lm()
:
输入:
model_lsdv <- lm(y ~ as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset)
options(max.print=2000)
summary(model_lsdv)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9499746 0.1505806 6.309 1.57e-08 ***
[...]
x1 -0.0216484 0.0167614 -1.292 0.20032
x2 0.0178114 0.0141219 1.261 0.21097
x3 -0.0145262 0.0103954 -1.397 0.16627
x4 -0.0061660 0.0133069 -0.463 0.64439
x5 0.0174401 0.0144256 1.209 0.23032
x6 -0.0053556 0.0067210 -0.797 0.42796
x7 0.0065517 0.0097627 0.671 0.50415
x8 -0.0151375 0.0081992 -1.846 0.06865 .
x9 0.0235351 0.0092612 2.541 0.01303 *
x10 0.0235181 0.0228927 1.027 0.30745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4.使用 lm()
:
贬低 FE
输入:
dataset_demeaned <- with(dataset, data.frame(respondent = respondent,
wave = wave,
y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)),
x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)),
x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)),
x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)),
x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)),
x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)),
x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)),
x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)),
x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)),
x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)),
x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T))
)
)
model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
summary(model_dmd)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.006223 0.008220 -0.757 0.44957
x2 0.013181 0.007880 1.673 0.09543 .
x3 -0.012807 0.005484 -2.335 0.02018 *
x4 -0.006431 0.006311 -1.019 0.30900
x5 0.015455 0.005941 2.602 0.00973 **
x6 -0.001429 0.003402 -0.420 0.67483
x7 0.004362 0.004698 0.929 0.35387
x8 -0.009336 0.004366 -2.139 0.03326 *
x9 0.015731 0.005267 2.987 0.00305 **
x10 0.007631 0.010922 0.699 0.48529
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
更多信息:
我已经执行了这些检查:
- 我用其他方式贬低数据,例如
demean()
函数。 --> 与 4. 相同的结果
- 我手工计算了一些贬值的数据,它产生的结果与
ave()
和demean()
函数相同。
- 我一直在尝试使用
na.action
选项,因为我希望问题可能是由对缺失值的不同处理引起的。但是并没有改变结果。
- 我曾经在 (4) demeaned fe 模型中包含了响应变量
as_factor
。喜欢:model_dmd <- lm(y ~ 0 + as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
。这种方法再现了正确的系数。然而,贬低应该已经解决了未观察到的异质性,因此包含假人似乎是多余的。
所以我最好的猜测是问题不是来自贬低的过程,而是来自lm()
函数。也许面板是 unbalanced
的事实在这里发挥了作用?
非常感谢您的任何建议和解释!
解决方案:
感谢@G.Grothendieck,我可以post解决这里问题。 (4) Demeaned FE with lm()
的正确代码应该是:
输入:
# Delete all rows with NAs
dataset <- na.omit(dataset)
# Demean the rows that are left behind
dataset_demeaned <- with(dataset, data.frame(respondent = respondent,
wave = wave,
y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)),
x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)),
x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)),
x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)),
x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)),
x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)),
x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)),
x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)),
x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)),
x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)),
x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T))
)
)
model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
summary(model_dmd)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.021648 0.008462 -2.558 0.011004 *
x2 0.017811 0.007130 2.498 0.013009 *
x3 -0.014526 0.005248 -2.768 0.005989 **
x4 -0.006166 0.006718 -0.918 0.359452
x5 0.017440 0.007283 2.395 0.017240 *
x6 -0.005356 0.003393 -1.578 0.115530
x7 0.006552 0.004929 1.329 0.184768
x8 -0.015138 0.004140 -3.657 0.000301 ***
x9 0.023535 0.004676 5.033 8.24e-07 ***
x10 0.023518 0.011558 2.035 0.042734 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
通过分别贬低每一列,这是在不一致地处理 NA。必须使用或不使用每一行。不能将一行用于一个变量而不用于另一个变量。
我正在尝试使用不同的包和技术重现面板数据的固定效应系数:(1) plm()
, (2) lfe()
, (3) dummy-lsdv with lm()
,以及 (4) 贬低-fe lm()
.
我的数据集包含 1581 个观察值和 13 个变量。它是来自 527 名受访者(var = respondent)的 3 波(var = wave)调查数据。我有一个 DV (y) 和 10 个 IV(x1 到 x10)。
数据集是这样的:
respondent wave y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 2 NA NA 2 1 1.5 NA NA 2
2 1 2 NA 2 NA 0 0 0 1 4 4 1 3
3 1 3 NA 4 5 NA NA NA NA 8 NA NA 1
4 2 1 0.931 3 3 2 2 2 4 7.5 7.5 NA 3
5 2 2 0.986 4 NA NA 2 2 4.5 6.5 5 3 4
6 2 3 0.986 4 3 2 2 2 3 3 3 2 3
我的问题: 当我使用 (1) plm()
、(2) lfe()
和 (3) dummy-lsdv 执行固定效应回归时使用 lm()
,模型总是 return 相同的系数。但是,当我使用 (4) 贬值数据和 lm()
包执行固定效应回归时,我得到了不同的系数。这让我很困惑,我想知道:为什么?
这是我的代码:
1. plm()
:
输入:
library(plm)
model_plm <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
data = dataset,
index=c("respondent","wave"),
model = "within",
effect = 'individual')
summary(model_plm)
输出:
Unbalanced Panel: n = 228, T = 1-2, N = 316
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.3240866 -0.0048416 0.0000000 0.0048416 0.3240866
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
x1 -0.0216484 0.0167614 -1.2916 0.20032
x2 0.0178114 0.0141219 1.2613 0.21097
x3 -0.0145262 0.0103954 -1.3974 0.16627
x4 -0.0061660 0.0133069 -0.4634 0.64439
x5 0.0174401 0.0144256 1.2090 0.23032
x6 -0.0053556 0.0067210 -0.7968 0.42796
x7 0.0065517 0.0097627 0.6711 0.50415
x8 -0.0151375 0.0081992 -1.8462 0.06865 .
x9 0.0235351 0.0092612 2.5412 0.01303 *
x10 0.0235181 0.0228927 1.0273 0.30745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2。 lfe()
:
输入:
library(lfe)
model_lfe <- felm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 | respondent, data = dataset)
summary(model_lfe)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.021648 0.016761 -1.292 0.2003
x2 0.017811 0.014122 1.261 0.2110
x3 -0.014526 0.010395 -1.397 0.1663
x4 -0.006166 0.013307 -0.463 0.6444
x5 0.017440 0.014426 1.209 0.2303
x6 -0.005356 0.006721 -0.797 0.4280
x7 0.006552 0.009763 0.671 0.5041
x8 -0.015138 0.008199 -1.846 0.0687 .
x9 0.023535 0.009261 2.541 0.0130 *
x10 0.023518 0.022893 1.027 0.3074
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3。 LSDV lm()
:
输入:
model_lsdv <- lm(y ~ as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset)
options(max.print=2000)
summary(model_lsdv)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9499746 0.1505806 6.309 1.57e-08 ***
[...]
x1 -0.0216484 0.0167614 -1.292 0.20032
x2 0.0178114 0.0141219 1.261 0.21097
x3 -0.0145262 0.0103954 -1.397 0.16627
x4 -0.0061660 0.0133069 -0.463 0.64439
x5 0.0174401 0.0144256 1.209 0.23032
x6 -0.0053556 0.0067210 -0.797 0.42796
x7 0.0065517 0.0097627 0.671 0.50415
x8 -0.0151375 0.0081992 -1.846 0.06865 .
x9 0.0235351 0.0092612 2.541 0.01303 *
x10 0.0235181 0.0228927 1.027 0.30745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4.使用 lm()
:
输入:
dataset_demeaned <- with(dataset, data.frame(respondent = respondent,
wave = wave,
y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)),
x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)),
x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)),
x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)),
x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)),
x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)),
x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)),
x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)),
x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)),
x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)),
x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T))
)
)
model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
summary(model_dmd)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.006223 0.008220 -0.757 0.44957
x2 0.013181 0.007880 1.673 0.09543 .
x3 -0.012807 0.005484 -2.335 0.02018 *
x4 -0.006431 0.006311 -1.019 0.30900
x5 0.015455 0.005941 2.602 0.00973 **
x6 -0.001429 0.003402 -0.420 0.67483
x7 0.004362 0.004698 0.929 0.35387
x8 -0.009336 0.004366 -2.139 0.03326 *
x9 0.015731 0.005267 2.987 0.00305 **
x10 0.007631 0.010922 0.699 0.48529
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
更多信息:
我已经执行了这些检查:
- 我用其他方式贬低数据,例如
demean()
函数。 --> 与 4. 相同的结果
- 我手工计算了一些贬值的数据,它产生的结果与
ave()
和demean()
函数相同。 - 我一直在尝试使用
na.action
选项,因为我希望问题可能是由对缺失值的不同处理引起的。但是并没有改变结果。 - 我曾经在 (4) demeaned fe 模型中包含了响应变量
as_factor
。喜欢:model_dmd <- lm(y ~ 0 + as_factor(respondent) + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
。这种方法再现了正确的系数。然而,贬低应该已经解决了未观察到的异质性,因此包含假人似乎是多余的。
所以我最好的猜测是问题不是来自贬低的过程,而是来自lm()
函数。也许面板是 unbalanced
的事实在这里发挥了作用?
非常感谢您的任何建议和解释!
解决方案:
感谢@G.Grothendieck,我可以post解决这里问题。 (4) Demeaned FE with lm()
的正确代码应该是:
输入:
# Delete all rows with NAs
dataset <- na.omit(dataset)
# Demean the rows that are left behind
dataset_demeaned <- with(dataset, data.frame(respondent = respondent,
wave = wave,
y = y - ave(y, respondent, FUN=function(x) mean(x, na.rm=T)),
x1 = x1 - ave(x1, respondent, FUN=function(x) mean(x, na.rm=T)),
x2 = x2 - ave(x2, respondent, FUN=function(x) mean(x, na.rm=T)),
x3 = x3 - ave(x3, respondent, FUN=function(x) mean(x, na.rm=T)),
x4 = x4 - ave(x4, respondent, FUN=function(x) mean(x, na.rm=T)),
x5 = x5 - ave(x5, respondent, FUN=function(x) mean(x, na.rm=T)),
x6 = x6 - ave(x6, respondent, FUN=function(x) mean(x, na.rm=T)),
x7 = x7 - ave(x7, respondent, FUN=function(x) mean(x, na.rm=T)),
x8 = x8 - ave(x8, respondent, FUN=function(x) mean(x, na.rm=T)),
x9 = x9 - ave(x9, respondent, FUN=function(x) mean(x, na.rm=T)),
x10 = x10 - ave(x10, respondent, FUN=function(x) mean(x, na.rm=T))
)
)
model_dmd <- lm(y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = dataset_demeaned)
summary(model_dmd)
输出:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 -0.021648 0.008462 -2.558 0.011004 *
x2 0.017811 0.007130 2.498 0.013009 *
x3 -0.014526 0.005248 -2.768 0.005989 **
x4 -0.006166 0.006718 -0.918 0.359452
x5 0.017440 0.007283 2.395 0.017240 *
x6 -0.005356 0.003393 -1.578 0.115530
x7 0.006552 0.004929 1.329 0.184768
x8 -0.015138 0.004140 -3.657 0.000301 ***
x9 0.023535 0.004676 5.033 8.24e-07 ***
x10 0.023518 0.011558 2.035 0.042734 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
通过分别贬低每一列,这是在不一致地处理 NA。必须使用或不使用每一行。不能将一行用于一个变量而不用于另一个变量。