更正 lm 函数中的 robust/clustered 个标准错误或替换结果
Correcting for robust/clustered standard errors within the lm function or replacing the results
交叉发布于 CrossValidated。
前段时间问了this question,是关于纠正使用IV/2SLS时的标准错误,第一阶段有一个tobit分布,[=得到了一个惊人的答案62=](底部的示例数据)。他为我提供了以下功能:
vcov2sls <- function(s1, s2, data, type=2) {
## get y names
y1.nm <- gsub(".*=\s(.*)(?=\s~).*", "\1", deparse(s1$call)[1], perl=TRUE)
y2.nm <- as.character(s2$terms)[2]
## auxilliary model matrix
X <- cbind(`(Intercept)`=1, data[, y1.nm, F], model.matrix(s2)[,-(1:2)])
## get y
y <- data[, y2.nm]
## betas second stage
b <- s2$coefficients
## calculate corrected sums of squares
sse <- sum((y - b %*% t(X))^2)
rmse <- sqrt(mean(s2$residuals^2)) ## RMSE 2nd stage
V0 <- vcov(s2) ## biased vcov 2nd stage
dof <- s2$df.residual ## degrees of freedom 2nd stage
## calculate corrected RMSE
rmse.c <- sqrt(sse/dof)
## calculate corrected vcov
V <- (rmse.c/rmse)^2 * V0
return(V)
}
所以我正在寻找的是一个函数,我可以在其中提供 vcov 矩阵(vcov2sls
),并具有稳健的集群标准错误。然而,它们似乎都属于同一个 vcov 矩阵选项。因此,如果我提供一个,我已经必须确保这些 se 是集群的和稳健的。所以我想我本质上是在问我如何才能使 vcov2sls
函数具有稳健和集群的错误。显然,任何其他导致相同实际结果的解决方案也会很好。
但是我想使用jay.sf的功能,当第一阶段是lm
时,聚类参与汇总(source),例如:
first_stage_ols <- lm(y~x, data=DT)
summary(first_stage_ols, robust=T)
是否有一种方法可以从 lm 函数中更正标准错误,或者(在结果中替换它们),或者调整 vcov2sls
函数来考虑 robust/clustered 标准错误?
编辑:我知道 lmtest:coeftest
也存在,但我希望能够使用 weights
。参见 this link。我无法确定这在 lmtest:coeftest
中是否可行。
编辑 I - 尝试测试人员解决方案
所以我尝试了测试人员在两种情况下的回答。首先,我从 tobit 转到 lm,另一个从 tobit 转到 lm,反之亦然。
# Tobit -> LM
library(lmtest)
library(sandwich)
## run with lm ##
s1.tobit <- AER::tobit(taxrate ~ votewon + industry + size + urbanisation + vote, data=DF)
# cluster and adjust ses
s1.robust <- vcovCL(s1.tobit, cluster = ~ industry)
s1.robust.se <- sqrt(diag(s1.robust))
s1.summary <- summary(s1.tobit)
s1.summary$coefficients[, 2] <- s1.robust.se
yhat <- fitted(s1.tobit)
s2.lm <- lm(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
lmtest::coeftest(s2.lm, vcov.=vcov2sls(s1.summary, s2.lm, DF))
# WORKS!
反之亦然:
# LM -> tobit
library(lmtest)
library(sandwich)
## run with lm ##
s1.lm <- lm(taxrate ~ votewon + industry + size + urbanisation + vote, data=DF)
# cluster and adjust ses
s1.robust <- vcovCL(s1.lm, cluster = ~ industry)
s1.robust.se <- sqrt(diag(s1.robust))
s1.summary <- summary(s1.lm)
s1.summary$coefficients[, 2] <- s1.robust.se
yhat <- fitted(s1.lm)
s2.tobit <- AER::tobit(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
and then ????
# DOES NOT WORK, NO WAY TO ADD THE VCOV TO TOBIT
编辑结束
编辑 II - 测试 lm_robust 和手动
之间的 p 值
当使用lm_robust
时,第一阶段的结果如下:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 25.3890287 2.1726518 11.6857327 0.009184928 15.393996 35.3840612 1.870781
votewon -0.9900966 2.1099738 -0.4692459 0.687605609 -10.636404 8.6562105 1.882014
industryE -0.7564888 0.3710393 -2.0388372 0.184868777 -2.434709 0.9217314 1.901678
industryF -2.6639323 0.3058024 -8.7112866 0.013649538 -4.002800 -1.3250647 1.964416
size -0.5291956 0.5523497 -0.9580807 0.443894805 -3.036862 1.9784705 1.894753
urbanisationB -1.5851495 2.2454251 -0.7059463 0.554845739 -11.464414 8.2941148 1.954657
urbanisationC -2.7234541 0.3573827 -7.6205532 0.020124544 -4.365749 -1.0811587 1.872744
vote 3.1749142 2.4600297 1.2906000 0.341874112 -9.076839 15.4266675 1.740353
然而,在进行手动计算时,p 值非常不同:
s1.summary
Call:
lm(formula = taxrate ~ votewon + industry + size + urbanisation +
vote, data = DF)
Residuals:
Min 1Q Median 3Q Max
-11.2747 -4.3212 -0.6788 4.3677 10.7369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.3890 2.1506 13.845 <2e-16 ***
votewon -0.9901 2.1742 -0.676 0.5007
industryE -0.7565 0.3492 -0.557 0.5792
industryF -2.6639 0.2877 -1.855 0.0668 .
size -0.5292 0.5109 -1.250 0.2145
urbanisationB -1.5851 2.2311 -1.098 0.2753
urbanisationC -2.7235 0.3474 -1.704 0.0918 .
vote 3.1749 2.4840 2.105 0.0380 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.623 on 92 degrees of freedom
Multiple R-squared: 0.1054, Adjusted R-squared: 0.03734
F-statistic: 1.549 on 7 and 92 DF, p-value: 0.1609
这只是第一阶段。
示例数据
DF <- structure(list(country = c("C", "C", "C", "C", "J", "J", "B",
"B", "F", "F", "E", "E", "D", "D", "F", "F", "I", "I", "J", "J",
"E", "E", "C", "C", "I", "I", "I", "I", "I", "I", "C", "C", "H",
"H", "J", "J", "G", "G", "J", "J", "I", "I", "C", "C", "D", "D",
"A", "A", "G", "G", "E", "E", "J", "J", "G", "G", "I", "I", "I",
"I", "J", "J", "G", "G", "E", "E", "G", "G", "E", "E", "F", "F",
"I", "I", "B", "B", "E", "E", "H", "H", "B", "B", "A", "A", "I",
"I", "I", "I", "F", "F", "E", "E", "I", "I", "J", "J", "D", "D",
"F", "F"), year = c(2005, 2010, 2010, 2005, 2005, 2010, 2010,
2005, 2010, 2005, 2005, 2010, 2010, 2005, 2005, 2010, 2005, 2010,
2005, 2010, 2010, 2005, 2010, 2005, 2005, 2010, 2005, 2010, 2010,
2005, 2010, 2005, 2005, 2010, 2010, 2005, 2005, 2010, 2005, 2010,
2005, 2010, 2005, 2010, 2010, 2005, 2005, 2010, 2010, 2005, 2010,
2005, 2010, 2005, 2010, 2005, 2010, 2005, 2010, 2005, 2010, 2005,
2010, 2005, 2010, 2005, 2010, 2005, 2005, 2010, 2005, 2010, 2005,
2010, 2005, 2010, 2005, 2010, 2005, 2010, 2010, 2005, 2005, 2010,
2005, 2010, 2010, 2005, 2010, 2005, 2010, 2005, 2005, 2010, 2005,
2010, 2010, 2005, 2010, 2005), sales = c(15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9), industry = c("D",
"D", "E", "E", "F", "F", "F", "F", "D", "D", "E", "E", "D", "D",
"E", "E", "F", "F", "F", "F", "D", "D", "F", "F", "E", "E", "D",
"D", "D", "D", "E", "E", "F", "F", "D", "D", "E", "E", "E", "E",
"D", "D", "E", "E", "D", "D", "D", "D", "E", "E", "D", "D", "F",
"F", "D", "D", "D", "D", "E", "E", "D", "D", "E", "E", "D", "D",
"D", "D", "D", "D", "F", "F", "F", "F", "E", "E", "D", "D", "E",
"E", "F", "F", "E", "E", "F", "F", "E", "E", "F", "F", "D", "D",
"D", "D", "D", "D", "D", "D", "F", "F"), urbanisation = c("B",
"B", "A", "A", "B", "B", "A", "A", "C", "C", "C", "C", "A", "A",
"B", "B", "C", "C", "A", "A", "C", "C", "B", "B", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "C", "C", "B", "B", "B", "B",
"B", "B", "C", "C", "A", "A", "B", "B", "B", "B", "A", "A", "B",
"B", "A", "A", "A", "A", "B", "B", "C", "C", "A", "A", "C", "C",
"A", "A", "B", "B", "A", "A", "B", "B", "B", "B", "B", "B", "C",
"C", "A", "A", "A", "A", "A", "A", "A", "A", "C", "C", "A", "A",
"B", "B", "A", "A", "B", "B", "B", "B"), size = c(1, 1, 5, 5,
5, 5, 1, 1, 1, 1, 5, 5, 5, 5, 2, 2, 2, 2, 5, 5, 1, 1, 1, 1, 5,
5, 5, 5, 4, 4, 5, 5, 5, 5, 4, 4, 2, 2, 5, 5, 1, 1, 1, 1, 2, 2,
1, 1, 2, 2, 5, 5, 1, 1, 3, 3, 2, 2, 2, 2, 5, 5, 4, 4, 1, 1, 5,
5, 2, 2, 5, 5, 2, 2, 2, 2, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3,
5, 5, 3, 3, 2, 2, 3, 3, 1, 1, 5, 5), base_rate = c(14L, 14L,
14L, 14L, 19L, 19L, 30L, 30L, 20L, 20L, 29L, 29L, 20L, 20L, 20L,
20L, 24L, 24L, 19L, 19L, 29L, 29L, 14L, 14L, 24L, 24L, 24L, 24L,
24L, 24L, 14L, 14L, 17L, 17L, 19L, 19L, 33L, 33L, 19L, 19L, 24L,
24L, 14L, 14L, 20L, 20L, 23L, 23L, 33L, 33L, 29L, 29L, 19L, 19L,
33L, 33L, 24L, 24L, 24L, 24L, 19L, 19L, 33L, 33L, 29L, 29L, 33L,
33L, 29L, 29L, 20L, 20L, 24L, 24L, 30L, 30L, 29L, 29L, 17L, 17L,
30L, 30L, 23L, 23L, 24L, 24L, 24L, 24L, 20L, 20L, 29L, 29L, 24L,
24L, 19L, 19L, 20L, 20L, 20L, 20L), taxrate = c(12L, 14L, 14L,
12L, 21L, 18L, 30L, 30L, 20L, 20L, 29L, 30L, 20L, 20L, 20L, 20L,
24L, 24L, 21L, 18L, 30L, 29L, 14L, 12L, 24L, 24L, 24L, 24L, 24L,
24L, 14L, 12L, 18L, 19L, 18L, 21L, 33L, 32L, 21L, 18L, 24L, 24L,
12L, 14L, 20L, 20L, 22L, 25L, 32L, 33L, 30L, 29L, 18L, 21L, 32L,
33L, 24L, 24L, 24L, 24L, 18L, 21L, 32L, 33L, 30L, 29L, 32L, 33L,
29L, 30L, 20L, 20L, 24L, 24L, 30L, 30L, 29L, 30L, 18L, 19L, 30L,
30L, 22L, 25L, 24L, 24L, 24L, 24L, 20L, 20L, 30L, 29L, 24L, 24L,
21L, 18L, 20L, 20L, 20L, 20L), vote = c(0, 0, 0, 0, 1, 1, 1,
0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1,
1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0,
1, 0, 1, 1, 1, 1, 0, 1, 1), votewon = c(0, 0, 0, 0, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,
1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
0, 0, 1, 1, 0, 1, 0, 1, 1)), class = "data.frame", row.names = c(NA,
-100L))
## convert variables to factors beforehand
DF[c(1, 2, 4, 5, 6, 9, 10)] <- lapply(DF[c(1, 2, 4, 5, 6, 9, 10)], factor)
s1.tobit <- AER::tobit(taxrate ~ votewon + industry + size + urbanisation + vote,
left=12, right=33, data=DF)
yhat <- fitted(s1.tobit)
s2.tobit <- lm(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
lmtest::coeftest(s2.tobit, vcov.=vcov2sls(s1.tobit, s2.tobit, DF))
编辑:
这样你就可以在第一阶段 运行 一个 lm,调整模型的 SEs 并用它来覆盖 summary(lm)
产生的 SEs。然后,您可以估计第二阶段并使用带有 coeftest()
的自定义函数。不确定集群,但这应该大致可行,不是吗?
library(lmtest)
library(sandwich)
## run with lm ##
s1.lm <- lm(taxrate ~ votewon + industry + size + urbanisation + vote,
data=DF)
# cluster and adjust ses
s1.robust <- vcovCL(s1.lm, cluster = ~ industry)
s1.robust.se <- sqrt(diag(s1.robust))
s1.summary <- summary(s1.lm)
s1.summary$coefficients[, 2] <- s1.robust.se
yhat <- fitted(s1.lm)
s2.lm <- lm(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
lmtest::coeftest(s2.lm, vcov.=vcov2sls(s1.summary, s2.lm, DF))
看看 estimatr
包,尤其是 lm_robust
。我不是 100% 确定你打算做什么,但你可以通过 运行 宁此获得可靠的标准错误:
library(estimatr)
lm1 <-
lm_robust(
mpg ~ hp,
data = mtcars,
clusters = cyl,
weights = wt,
se_type = "stata" # alternatives: CR0, CR1
)
summary(lm1)
使用上面的示例,这大致就是您要查找的内容吗?请注意,lm_robust 已经调整了 SE。
s1.lm <- lm_robust(taxrate ~ votewon + industry + size + urbanisation + vote,
data = DF)
yhat <- fitted(s1.lm)
s2.lm <- lm(sales ~ yhat + industry + size + urbanisation + vote, data = DF)
> lmtest::coeftest(s2.lm, vcov. = vcov2sls(s1.lm, s2.lm, DF))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -18.45116 62.14257 -0.2969 0.7672
yhat 1.57784 2.72176 0.5797 0.5636
industryE 0.98174 5.10677 0.1922 0.8480
industryF 2.09036 7.25181 0.2883 0.7738
size2 -8.85327 12.43454 -0.7120 0.4783
size3 -5.74011 7.14973 -0.8028 0.4242
size4 -10.79326 13.14534 -0.8211 0.4138
size5 -3.38280 5.45691 -0.6199 0.5369
urbanisationB -1.74588 6.34107 -0.2753 0.7837
urbanisationC -2.00370 6.48533 -0.3090 0.7581
vote1 -1.01661 6.49424 -0.1565 0.8760
> lmtest::coeftest(s2.lm)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -18.45116 46.39576 -0.3977 0.6918
yhat 1.57784 2.03207 0.7765 0.4395
industryE 0.98174 3.81273 0.2575 0.7974
industryF 2.09036 5.41421 0.3861 0.7004
size2 -8.85327 9.28365 -0.9536 0.3428
size3 -5.74011 5.33801 -1.0753 0.2851
size4 -10.79326 9.81434 -1.0997 0.2744
size5 -3.38280 4.07414 -0.8303 0.4086
urbanisationB -1.74588 4.73425 -0.3688 0.7132
urbanisationC -2.00370 4.84196 -0.4138 0.6800
vote1 -1.01661 4.84861 -0.2097 0.8344
我会根据您的评论更新我的答案。
交叉发布于 CrossValidated。
前段时间问了this question,是关于纠正使用IV/2SLS时的标准错误,第一阶段有一个tobit分布,[=得到了一个惊人的答案62=](底部的示例数据)。他为我提供了以下功能:
vcov2sls <- function(s1, s2, data, type=2) {
## get y names
y1.nm <- gsub(".*=\s(.*)(?=\s~).*", "\1", deparse(s1$call)[1], perl=TRUE)
y2.nm <- as.character(s2$terms)[2]
## auxilliary model matrix
X <- cbind(`(Intercept)`=1, data[, y1.nm, F], model.matrix(s2)[,-(1:2)])
## get y
y <- data[, y2.nm]
## betas second stage
b <- s2$coefficients
## calculate corrected sums of squares
sse <- sum((y - b %*% t(X))^2)
rmse <- sqrt(mean(s2$residuals^2)) ## RMSE 2nd stage
V0 <- vcov(s2) ## biased vcov 2nd stage
dof <- s2$df.residual ## degrees of freedom 2nd stage
## calculate corrected RMSE
rmse.c <- sqrt(sse/dof)
## calculate corrected vcov
V <- (rmse.c/rmse)^2 * V0
return(V)
}
所以我正在寻找的是一个函数,我可以在其中提供 vcov 矩阵(vcov2sls
),并具有稳健的集群标准错误。然而,它们似乎都属于同一个 vcov 矩阵选项。因此,如果我提供一个,我已经必须确保这些 se 是集群的和稳健的。所以我想我本质上是在问我如何才能使 vcov2sls
函数具有稳健和集群的错误。显然,任何其他导致相同实际结果的解决方案也会很好。
但是我想使用jay.sf的功能,当第一阶段是lm
时,聚类参与汇总(source),例如:
first_stage_ols <- lm(y~x, data=DT)
summary(first_stage_ols, robust=T)
是否有一种方法可以从 lm 函数中更正标准错误,或者(在结果中替换它们),或者调整 vcov2sls
函数来考虑 robust/clustered 标准错误?
编辑:我知道 lmtest:coeftest
也存在,但我希望能够使用 weights
。参见 this link。我无法确定这在 lmtest:coeftest
中是否可行。
编辑 I - 尝试测试人员解决方案
所以我尝试了测试人员在两种情况下的回答。首先,我从 tobit 转到 lm,另一个从 tobit 转到 lm,反之亦然。
# Tobit -> LM
library(lmtest)
library(sandwich)
## run with lm ##
s1.tobit <- AER::tobit(taxrate ~ votewon + industry + size + urbanisation + vote, data=DF)
# cluster and adjust ses
s1.robust <- vcovCL(s1.tobit, cluster = ~ industry)
s1.robust.se <- sqrt(diag(s1.robust))
s1.summary <- summary(s1.tobit)
s1.summary$coefficients[, 2] <- s1.robust.se
yhat <- fitted(s1.tobit)
s2.lm <- lm(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
lmtest::coeftest(s2.lm, vcov.=vcov2sls(s1.summary, s2.lm, DF))
# WORKS!
反之亦然:
# LM -> tobit
library(lmtest)
library(sandwich)
## run with lm ##
s1.lm <- lm(taxrate ~ votewon + industry + size + urbanisation + vote, data=DF)
# cluster and adjust ses
s1.robust <- vcovCL(s1.lm, cluster = ~ industry)
s1.robust.se <- sqrt(diag(s1.robust))
s1.summary <- summary(s1.lm)
s1.summary$coefficients[, 2] <- s1.robust.se
yhat <- fitted(s1.lm)
s2.tobit <- AER::tobit(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
and then ????
# DOES NOT WORK, NO WAY TO ADD THE VCOV TO TOBIT
编辑结束
编辑 II - 测试 lm_robust 和手动
之间的 p 值当使用lm_robust
时,第一阶段的结果如下:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 25.3890287 2.1726518 11.6857327 0.009184928 15.393996 35.3840612 1.870781
votewon -0.9900966 2.1099738 -0.4692459 0.687605609 -10.636404 8.6562105 1.882014
industryE -0.7564888 0.3710393 -2.0388372 0.184868777 -2.434709 0.9217314 1.901678
industryF -2.6639323 0.3058024 -8.7112866 0.013649538 -4.002800 -1.3250647 1.964416
size -0.5291956 0.5523497 -0.9580807 0.443894805 -3.036862 1.9784705 1.894753
urbanisationB -1.5851495 2.2454251 -0.7059463 0.554845739 -11.464414 8.2941148 1.954657
urbanisationC -2.7234541 0.3573827 -7.6205532 0.020124544 -4.365749 -1.0811587 1.872744
vote 3.1749142 2.4600297 1.2906000 0.341874112 -9.076839 15.4266675 1.740353
然而,在进行手动计算时,p 值非常不同:
s1.summary
Call:
lm(formula = taxrate ~ votewon + industry + size + urbanisation +
vote, data = DF)
Residuals:
Min 1Q Median 3Q Max
-11.2747 -4.3212 -0.6788 4.3677 10.7369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.3890 2.1506 13.845 <2e-16 ***
votewon -0.9901 2.1742 -0.676 0.5007
industryE -0.7565 0.3492 -0.557 0.5792
industryF -2.6639 0.2877 -1.855 0.0668 .
size -0.5292 0.5109 -1.250 0.2145
urbanisationB -1.5851 2.2311 -1.098 0.2753
urbanisationC -2.7235 0.3474 -1.704 0.0918 .
vote 3.1749 2.4840 2.105 0.0380 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.623 on 92 degrees of freedom
Multiple R-squared: 0.1054, Adjusted R-squared: 0.03734
F-statistic: 1.549 on 7 and 92 DF, p-value: 0.1609
这只是第一阶段。
示例数据
DF <- structure(list(country = c("C", "C", "C", "C", "J", "J", "B",
"B", "F", "F", "E", "E", "D", "D", "F", "F", "I", "I", "J", "J",
"E", "E", "C", "C", "I", "I", "I", "I", "I", "I", "C", "C", "H",
"H", "J", "J", "G", "G", "J", "J", "I", "I", "C", "C", "D", "D",
"A", "A", "G", "G", "E", "E", "J", "J", "G", "G", "I", "I", "I",
"I", "J", "J", "G", "G", "E", "E", "G", "G", "E", "E", "F", "F",
"I", "I", "B", "B", "E", "E", "H", "H", "B", "B", "A", "A", "I",
"I", "I", "I", "F", "F", "E", "E", "I", "I", "J", "J", "D", "D",
"F", "F"), year = c(2005, 2010, 2010, 2005, 2005, 2010, 2010,
2005, 2010, 2005, 2005, 2010, 2010, 2005, 2005, 2010, 2005, 2010,
2005, 2010, 2010, 2005, 2010, 2005, 2005, 2010, 2005, 2010, 2010,
2005, 2010, 2005, 2005, 2010, 2010, 2005, 2005, 2010, 2005, 2010,
2005, 2010, 2005, 2010, 2010, 2005, 2005, 2010, 2010, 2005, 2010,
2005, 2010, 2005, 2010, 2005, 2010, 2005, 2010, 2005, 2010, 2005,
2010, 2005, 2010, 2005, 2010, 2005, 2005, 2010, 2005, 2010, 2005,
2010, 2005, 2010, 2005, 2010, 2005, 2010, 2010, 2005, 2005, 2010,
2005, 2010, 2010, 2005, 2010, 2005, 2010, 2005, 2005, 2010, 2005,
2010, 2010, 2005, 2010, 2005), sales = c(15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9, 15.48, 12.39, 3.72,
23.61, 4, 31.87, 25.33, 7.64, -0.26, 2.9), industry = c("D",
"D", "E", "E", "F", "F", "F", "F", "D", "D", "E", "E", "D", "D",
"E", "E", "F", "F", "F", "F", "D", "D", "F", "F", "E", "E", "D",
"D", "D", "D", "E", "E", "F", "F", "D", "D", "E", "E", "E", "E",
"D", "D", "E", "E", "D", "D", "D", "D", "E", "E", "D", "D", "F",
"F", "D", "D", "D", "D", "E", "E", "D", "D", "E", "E", "D", "D",
"D", "D", "D", "D", "F", "F", "F", "F", "E", "E", "D", "D", "E",
"E", "F", "F", "E", "E", "F", "F", "E", "E", "F", "F", "D", "D",
"D", "D", "D", "D", "D", "D", "F", "F"), urbanisation = c("B",
"B", "A", "A", "B", "B", "A", "A", "C", "C", "C", "C", "A", "A",
"B", "B", "C", "C", "A", "A", "C", "C", "B", "B", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "C", "C", "B", "B", "B", "B",
"B", "B", "C", "C", "A", "A", "B", "B", "B", "B", "A", "A", "B",
"B", "A", "A", "A", "A", "B", "B", "C", "C", "A", "A", "C", "C",
"A", "A", "B", "B", "A", "A", "B", "B", "B", "B", "B", "B", "C",
"C", "A", "A", "A", "A", "A", "A", "A", "A", "C", "C", "A", "A",
"B", "B", "A", "A", "B", "B", "B", "B"), size = c(1, 1, 5, 5,
5, 5, 1, 1, 1, 1, 5, 5, 5, 5, 2, 2, 2, 2, 5, 5, 1, 1, 1, 1, 5,
5, 5, 5, 4, 4, 5, 5, 5, 5, 4, 4, 2, 2, 5, 5, 1, 1, 1, 1, 2, 2,
1, 1, 2, 2, 5, 5, 1, 1, 3, 3, 2, 2, 2, 2, 5, 5, 4, 4, 1, 1, 5,
5, 2, 2, 5, 5, 2, 2, 2, 2, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3,
5, 5, 3, 3, 2, 2, 3, 3, 1, 1, 5, 5), base_rate = c(14L, 14L,
14L, 14L, 19L, 19L, 30L, 30L, 20L, 20L, 29L, 29L, 20L, 20L, 20L,
20L, 24L, 24L, 19L, 19L, 29L, 29L, 14L, 14L, 24L, 24L, 24L, 24L,
24L, 24L, 14L, 14L, 17L, 17L, 19L, 19L, 33L, 33L, 19L, 19L, 24L,
24L, 14L, 14L, 20L, 20L, 23L, 23L, 33L, 33L, 29L, 29L, 19L, 19L,
33L, 33L, 24L, 24L, 24L, 24L, 19L, 19L, 33L, 33L, 29L, 29L, 33L,
33L, 29L, 29L, 20L, 20L, 24L, 24L, 30L, 30L, 29L, 29L, 17L, 17L,
30L, 30L, 23L, 23L, 24L, 24L, 24L, 24L, 20L, 20L, 29L, 29L, 24L,
24L, 19L, 19L, 20L, 20L, 20L, 20L), taxrate = c(12L, 14L, 14L,
12L, 21L, 18L, 30L, 30L, 20L, 20L, 29L, 30L, 20L, 20L, 20L, 20L,
24L, 24L, 21L, 18L, 30L, 29L, 14L, 12L, 24L, 24L, 24L, 24L, 24L,
24L, 14L, 12L, 18L, 19L, 18L, 21L, 33L, 32L, 21L, 18L, 24L, 24L,
12L, 14L, 20L, 20L, 22L, 25L, 32L, 33L, 30L, 29L, 18L, 21L, 32L,
33L, 24L, 24L, 24L, 24L, 18L, 21L, 32L, 33L, 30L, 29L, 32L, 33L,
29L, 30L, 20L, 20L, 24L, 24L, 30L, 30L, 29L, 30L, 18L, 19L, 30L,
30L, 22L, 25L, 24L, 24L, 24L, 24L, 20L, 20L, 30L, 29L, 24L, 24L,
21L, 18L, 20L, 20L, 20L, 20L), vote = c(0, 0, 0, 0, 1, 1, 1,
0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1,
1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0,
1, 0, 1, 1, 1, 1, 0, 1, 1), votewon = c(0, 0, 0, 0, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,
1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
0, 0, 1, 1, 0, 1, 0, 1, 1)), class = "data.frame", row.names = c(NA,
-100L))
## convert variables to factors beforehand
DF[c(1, 2, 4, 5, 6, 9, 10)] <- lapply(DF[c(1, 2, 4, 5, 6, 9, 10)], factor)
s1.tobit <- AER::tobit(taxrate ~ votewon + industry + size + urbanisation + vote,
left=12, right=33, data=DF)
yhat <- fitted(s1.tobit)
s2.tobit <- lm(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
lmtest::coeftest(s2.tobit, vcov.=vcov2sls(s1.tobit, s2.tobit, DF))
编辑:
这样你就可以在第一阶段 运行 一个 lm,调整模型的 SEs 并用它来覆盖 summary(lm)
产生的 SEs。然后,您可以估计第二阶段并使用带有 coeftest()
的自定义函数。不确定集群,但这应该大致可行,不是吗?
library(lmtest)
library(sandwich)
## run with lm ##
s1.lm <- lm(taxrate ~ votewon + industry + size + urbanisation + vote,
data=DF)
# cluster and adjust ses
s1.robust <- vcovCL(s1.lm, cluster = ~ industry)
s1.robust.se <- sqrt(diag(s1.robust))
s1.summary <- summary(s1.lm)
s1.summary$coefficients[, 2] <- s1.robust.se
yhat <- fitted(s1.lm)
s2.lm <- lm(sales ~ yhat + industry + size + urbanisation + vote, data=DF)
lmtest::coeftest(s2.lm, vcov.=vcov2sls(s1.summary, s2.lm, DF))
看看 estimatr
包,尤其是 lm_robust
。我不是 100% 确定你打算做什么,但你可以通过 运行 宁此获得可靠的标准错误:
library(estimatr)
lm1 <-
lm_robust(
mpg ~ hp,
data = mtcars,
clusters = cyl,
weights = wt,
se_type = "stata" # alternatives: CR0, CR1
)
summary(lm1)
使用上面的示例,这大致就是您要查找的内容吗?请注意,lm_robust 已经调整了 SE。
s1.lm <- lm_robust(taxrate ~ votewon + industry + size + urbanisation + vote,
data = DF)
yhat <- fitted(s1.lm)
s2.lm <- lm(sales ~ yhat + industry + size + urbanisation + vote, data = DF)
> lmtest::coeftest(s2.lm, vcov. = vcov2sls(s1.lm, s2.lm, DF))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -18.45116 62.14257 -0.2969 0.7672
yhat 1.57784 2.72176 0.5797 0.5636
industryE 0.98174 5.10677 0.1922 0.8480
industryF 2.09036 7.25181 0.2883 0.7738
size2 -8.85327 12.43454 -0.7120 0.4783
size3 -5.74011 7.14973 -0.8028 0.4242
size4 -10.79326 13.14534 -0.8211 0.4138
size5 -3.38280 5.45691 -0.6199 0.5369
urbanisationB -1.74588 6.34107 -0.2753 0.7837
urbanisationC -2.00370 6.48533 -0.3090 0.7581
vote1 -1.01661 6.49424 -0.1565 0.8760
> lmtest::coeftest(s2.lm)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -18.45116 46.39576 -0.3977 0.6918
yhat 1.57784 2.03207 0.7765 0.4395
industryE 0.98174 3.81273 0.2575 0.7974
industryF 2.09036 5.41421 0.3861 0.7004
size2 -8.85327 9.28365 -0.9536 0.3428
size3 -5.74011 5.33801 -1.0753 0.2851
size4 -10.79326 9.81434 -1.0997 0.2744
size5 -3.38280 4.07414 -0.8303 0.4086
urbanisationB -1.74588 4.73425 -0.3688 0.7132
urbanisationC -2.00370 4.84196 -0.4138 0.6800
vote1 -1.01661 4.84861 -0.2097 0.8344
我会根据您的评论更新我的答案。