R model.matrix 删除多重共线性变量
R model.matrix drop multicollinear variables
有没有办法强制model.matrix.lm
删除多重共线性变量,就像lm()
在估计阶段所做的那样?
这是一个例子:
library(fixest)
N <- 10
x1 <- rnorm(N)
x2 <- x1
y <- 1 + x1 + x2 + rnorm(N)
df <- data.frame(y = y, x1 = x1, x2 = x2)
fit_lm <- lm(y ~ x1 + x2, data = df)
summary(fit_lm)
# Call:
# lm(formula = y ~ x1 + x2, data = df)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.82680 -0.41503 0.05499 0.67185 0.97830
#
# Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.7494 0.2885 2.598 0.0317 *
# x1 2.3905 0.3157 7.571 6.48e-05 ***
# x2 NA NA NA NA
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.8924 on 8 degrees of freedom
# Multiple R-squared: 0.8775, Adjusted R-squared: 0.8622
# F-statistic: 57.33 on 1 and 8 DF, p-value: 6.476e-05
请注意,lm()
从模型中删除了共线变量 x2
。但是 model.matrix()
保留它:
model.matrix(fit_lm)
# (Intercept) x1 x2
#1 1 1.41175158 1.41175158
#2 1 0.06164133 0.06164133
#3 1 0.09285047 0.09285047
#4 1 -0.63202909 -0.63202909
#5 1 0.25189850 0.25189850
#6 1 -0.18553830 -0.18553830
#7 1 0.65630180 0.65630180
#8 1 -1.77536852 -1.77536852
#9 1 -0.30571009 -0.30571009
#10 1 -1.47296229 -1.47296229
#attr(,"assign")
#[1] 0 1 2
fixst 中的 model.matrix
方法允许删除 x2
:
fit_feols <- feols(y ~ x1 + x2, data = df)
model.matrix(fit_feols, type = "rhs", collin.rm = TRUE)
# (Intercept) x1
# [1,] 1 1.41175158
# [2,] 1 0.06164133
# [3,] 1 0.09285047
# [4,] 1 -0.63202909
# [5,] 1 0.25189850
# [6,] 1 -0.18553830
# [7,] 1 0.65630180
# [8,] 1 -1.77536852
# [9,] 1 -0.30571009
# [10,] 1 -1.47296229
有没有办法在调用 model.matrix.lm()
时删除 x2
?
只要 运行 线性模型的开销不是太高,您就可以像这里那样编写一个小函数来完成它:
N <- 10
x1 <- rnorm(N)
x2 <- x1
y <- 1 + x1 + x2 + rnorm(N)
df <- data.frame(y = y, x1 = x1, x2 = x2)
fit_lm <- lm(y ~ x1 + x2, data = df)
model.matrix2 <- function(model){
bn <- names(na.omit(coef(model)))
X <- model.matrix(model)
X[,colnames(X) %in% bn]
}
model.matrix2(fit_lm)
#> (Intercept) x1
#> 1 1 -0.04654473
#> 2 1 2.14473751
#> 3 1 0.02688125
#> 4 1 0.95071038
#> 5 1 -1.41621259
#> 6 1 1.47840480
#> 7 1 0.56580182
#> 8 1 0.14480401
#> 9 1 -0.02404072
#> 10 1 -0.14393258
由 reprex package (v2.0.1)
创建于 2022-05-02
在上面的代码中,model.matrix2()
是 post-processes 模型矩阵仅包含线性模型中具有 non-missing 系数的变量的函数。
有没有办法强制model.matrix.lm
删除多重共线性变量,就像lm()
在估计阶段所做的那样?
这是一个例子:
library(fixest)
N <- 10
x1 <- rnorm(N)
x2 <- x1
y <- 1 + x1 + x2 + rnorm(N)
df <- data.frame(y = y, x1 = x1, x2 = x2)
fit_lm <- lm(y ~ x1 + x2, data = df)
summary(fit_lm)
# Call:
# lm(formula = y ~ x1 + x2, data = df)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.82680 -0.41503 0.05499 0.67185 0.97830
#
# Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.7494 0.2885 2.598 0.0317 *
# x1 2.3905 0.3157 7.571 6.48e-05 ***
# x2 NA NA NA NA
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.8924 on 8 degrees of freedom
# Multiple R-squared: 0.8775, Adjusted R-squared: 0.8622
# F-statistic: 57.33 on 1 and 8 DF, p-value: 6.476e-05
请注意,lm()
从模型中删除了共线变量 x2
。但是 model.matrix()
保留它:
model.matrix(fit_lm)
# (Intercept) x1 x2
#1 1 1.41175158 1.41175158
#2 1 0.06164133 0.06164133
#3 1 0.09285047 0.09285047
#4 1 -0.63202909 -0.63202909
#5 1 0.25189850 0.25189850
#6 1 -0.18553830 -0.18553830
#7 1 0.65630180 0.65630180
#8 1 -1.77536852 -1.77536852
#9 1 -0.30571009 -0.30571009
#10 1 -1.47296229 -1.47296229
#attr(,"assign")
#[1] 0 1 2
fixst 中的 model.matrix
方法允许删除 x2
:
fit_feols <- feols(y ~ x1 + x2, data = df)
model.matrix(fit_feols, type = "rhs", collin.rm = TRUE)
# (Intercept) x1
# [1,] 1 1.41175158
# [2,] 1 0.06164133
# [3,] 1 0.09285047
# [4,] 1 -0.63202909
# [5,] 1 0.25189850
# [6,] 1 -0.18553830
# [7,] 1 0.65630180
# [8,] 1 -1.77536852
# [9,] 1 -0.30571009
# [10,] 1 -1.47296229
有没有办法在调用 model.matrix.lm()
时删除 x2
?
只要 运行 线性模型的开销不是太高,您就可以像这里那样编写一个小函数来完成它:
N <- 10
x1 <- rnorm(N)
x2 <- x1
y <- 1 + x1 + x2 + rnorm(N)
df <- data.frame(y = y, x1 = x1, x2 = x2)
fit_lm <- lm(y ~ x1 + x2, data = df)
model.matrix2 <- function(model){
bn <- names(na.omit(coef(model)))
X <- model.matrix(model)
X[,colnames(X) %in% bn]
}
model.matrix2(fit_lm)
#> (Intercept) x1
#> 1 1 -0.04654473
#> 2 1 2.14473751
#> 3 1 0.02688125
#> 4 1 0.95071038
#> 5 1 -1.41621259
#> 6 1 1.47840480
#> 7 1 0.56580182
#> 8 1 0.14480401
#> 9 1 -0.02404072
#> 10 1 -0.14393258
由 reprex package (v2.0.1)
创建于 2022-05-02在上面的代码中,model.matrix2()
是 post-processes 模型矩阵仅包含线性模型中具有 non-missing 系数的变量的函数。