如何为具有不同公式的多个 glm 调用仅加载一次数据？

Question

我有一个数据集，其中 1 列用于因变量，9 列用于自变量。我必须在 R 中采用自变量的所有组合来拟合 logit 模型。

我已经为 "glm" 函数中使用的公式创建了相同的公式。但是，每次我调用 "glm" 函数时，它都会加载数据（每次都是相同的，因为每次迭代中只有公式发生变化）。

有没有办法避免这种情况以加快我的计算速度？我可以在 "glm" 函数中使用公式向量并只加载一次数据吗？

代码：

tempCoeffV <- lapply(formuleVector, function(s) {   coef(glm(s,data=myData,family=binomial, y=FALSE, model=FALSE))})


formuleVector is a vector of strings like: 
myData[,1]~myData[,2]+myData[,3]+myData[,5]
myData[,1]~myData[,2]+myData[,6]

我的数据是 data.frame

在每个 lapply 语句中，myData 保持不变。它是一个 data.frame，大约有 1,00,000 条记录。 formuleVector 是一个包含 511 个不同公式的向量。有没有办法加快这个计算？

Answer 1

很好，你没有因素； othersie 我必须调用 model.matrix 然后使用 $assign 字段，而不是简单地使用 data.matrix.

## Assuming `mydata[, 1]` is your response

## complete model matrix and model response
X <- data.matrix(mydata); y <- X[, 1]; X[, 1] <- 1

## covariates names and response name
vars <- names(mydata)

这就是您获得 511 候选人的方式，对吗？

choose(9, 1:9)
# [1]   9  36  84 126 126  84  36   9   1

现在我们需要一个组合索引，而不是组合的数量，很容易从 combn 中获得。剩下的故事就是写一个循环嵌套并循环遍历所有组合。 glm.fit 被使用，因为你只关心系数。

模型矩阵已经建立；我们只动态 select 它的列；
循环嵌套并不可怕； glm.fit 比 for 循环成本高得多。为了便于阅读，例如不要将它们重新编码为 lapply。

lst <- vector("list", 9)  ## a list to store all result
for ( k in 1:9 ) {
  ## combn index; each column is a combination
  ## plus 1 as an offset as there is an intercept in `X`
  I <- combn(9, k) + 1
  ## now loop through all combinations, calling `glm.fit`
  n <- choose(9, k)
  lstk <- vector("list", n)
  for ( j in seq.int(n) )
    ## current index
    ind <- I[, j]
    ## get regression coefficients
    b <- glm.fit(X[, c(1, ind)], y, family = binomial())$coefficients
    ## attach model formula as an attribute
    attr(b, "formula") <- reformulate(vars[ind], vars[1])
    ## store
    lstk[[j]] <- b
    }
  lst[[k]] <- lstk
  }

最后，lst是一个嵌套列表。用str(lst)来理解。

如何为具有不同公式的多个 glm 调用仅加载一次数据？

How to load data only once for multiple glm calls with varying formulas?

performance

regression

r

glm