如何在不在 R 中重复我的代码的情况下从线性模型中提取系数?

How to extract the coefficients from a linear model without repeating my code in R?

我正在使用 Montecarlo 模拟来预测 mtcars 数据中的 mpg。我想提取数据框中所有变量的系数,以计算每辆车的 mpg 比另一辆车低多少次。例如,有多少次 Toyota Corona 的预测 mpg 比 Datsun 710 少。这是我的初始代码,仅使用两个自变量。我想扩展此选择以使用数据框中的所有变量,而无需手动将所有变量包含在数据框中。 有什么办法可以做到这一点吗?

library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)

df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]

beta_0 = fit$coefficients[1] # Intercept 
beta_1 = fit$coefficients[2] # Slope
beta_2 = fit$coefficients[3]
set.seed(1)  # Seed
n = 1000     # Sample size
M = 500      # Number of experiments/iterations


estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
  # Generate data
  U_i = rnorm(n, mean = 0, sd = 2) # Error
  X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
  X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
  Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i  # Dependent variable
  
  # Formulate data.table
  data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
  
  # Run regressions
  ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)  
  ols_i$coefficients
}))

estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))

compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
  car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
  car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
  
  predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
  predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
  
  list(
    car1LowerCar2 = sum(predsCar1 < predsCar2),
    car2LowerCar1 = sum(predsCar1 >= predsCar2)
  )
}

compareCarEstimations("Toyota Corona", "Datsun 710")

我还没有完全看完你的例子,但这里是关于如何构建一组随机预测变量并将它们与系数向量矩阵相乘以获得预测值的要点:

设置:

df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
n <- 1000
beta <- coef(fit) ## parameter vector (includes intercept)
npar <- length(beta)
X <- matrix(rnorm(n*npar),ncol=npar)  ## includes intercept
## scale columns by the corresponding sd
## (all identical in this case)
X <- sweep(X, MARGIN=2, FUN="*", STATS=rep(5,npar))
## shift columns by the corresponding mean
## (all identical in this case)
X <- sweep(X, MARGIN=2, FUN="+", STATS=rep(5,npar))
Y0 <- X %*% beta
Y <- rnorm(n, mean=Y0, sd=2)