使用 purrr 运行 具有变化结果的多元回归模型,然后提取残差

Use purrr to run multiple regression models with changing outcomes and then extract residuals

我想 运行 一些具有不同 y 的回归模型(因此自变量对所有模型保持相同),然后从每个模型中提取残差并添加他们到原始数据集。

我会用diamonds来展示我的想法:

# In my example, the models are: x or y or z = carat + cut + color + clarity + price

dependent = c("x", "y", "z")

model = function(y, dataset) {
 a = map(
   setNames(y, y), ~ glm(reformulate(termlabels = c("carat", "cut", "color", "clarity", "price"),
                                     response = y),
                         family = gaussian(link = "identity"),
                         data = dataset
   )) 
 
 resids = map_dfr(a, residuals)
 
 df = bind_cols(dataset, resids)

 print(df)

}

model(y = dependent, dataset = diamonds)

但是这段代码不起作用。我也想为作为新列添加的残差起一个合理的名称,否则当模型数量很大时很难区分残差。

生成示例数据

library(tidyverse)

set.seed(101)
dd <- diamonds
dependent <- c("x", "y", "z")
for (d in dependent) {
  dd[[d]] <- rnorm(nrow(diamonds))
}

进程

library(broom)
res <- (dependent
  ## set names so .id = argument works downstream
  %>% setNames(dependent)
  ## construct list of formulas
  %>% map(reformulate, termlabels = c("carat", "cut", "color", "clarity", "price"))
  ## fit glmes
  %>% map(glm, family = gaussian(link = "identity"), dd,
          na.action = na.exclude)
  ## compute resids (add observation number) and collapse to tibble
  %>% map_dfr(~tibble(.obs=seq(nrow(dd)), .resid = residuals(.)), .id = "response")
  ## widen data → residuals from each response variable as a column
  %>% pivot_wider(names_from = "response", values_from = ".resid", 
                  names_prefix ="res_")
  %>% select(-.obs)
)
## combine with original data
res2 <- bind_cols(dd, res)

备注:

  • 我不明白你为什么要在这里使用 glm(., family = gaussian(link = "identity)),除非它是你用真实数据做的更复杂的事情的占位符。 (如果这是您的实际模型,那么使用 lm() 会更简单、更快。)
  • na.action = na.exclude 添加到 [g]lm() 调用将在预测、残差等中包含 NA 值,这将帮助您的残差与原始数据更好地对齐。