使用 R 的快速 post hoc 计算

Question

我有一个大型数据集，我想执行 post 临时计算：

dat = as.data.frame(matrix(runif(10000*300), ncol = 10000, nrow = 300))

dat$group = rep(letters[1:3], 100)

这是我的代码：

start <- Sys.time()

vars <- names(dat)[-ncol(dat)] 

aov.out <- lapply(vars, function(x) {
        lm(substitute(i ~ group, list(i = as.name(x))), data = dat)})

TukeyHSD.out <- lapply(aov.out, function(x) TukeyHSD(aov(x)))

Sys.time() - start

时差 4.033335 分钟

大约需要 4 分钟，是否有更高效和优雅的方法来使用 R 执行 post hoc？

非常感谢

Answer 1

你的例子太大了。为了说明这个想法，我使用了一个小的。

set.seed(0)
dat = as.data.frame(matrix(runif(2*300), ncol = 2, nrow = 300))
dat$group = rep(letters[1:3], 100)

为什么在合身的 "lm" 模型上调用 aov？那基本就是改装同款了

先阅读。 lm 是 aov 的主力，因此您可以将多个 LHS 公式传递给 aov。该模型有 class c("maov", "aov", "mlm", "lm").

response_names <- names(dat)[-ncol(dat)]
form <- as.formula(sprintf("cbind(%s) ~ group", toString(response_names)))
fit <- do.call("aov", list(formula = form, data = quote(dat)))

现在的问题是：TuckyHSD 没有 "maov" 方法。所以我们需要一个黑客。

TuckyHSD 依赖于拟合模型的残差。在 c("aov", "lm") 的情况下，残差是一个向量，但在 c("maov", "aov", "mlm", "lm") 的情况下，它是一个矩阵。下面演示黑客攻击。

aov_hack <- fit
aov_hack[c("coefficients", "fitted.values")] <- NULL  ## don't need them
aov_hack[c("contrasts", "xlevels")] <- NULL  ## don't need them either
attr(aov_hack$model, "terms") <- NULL  ## don't need it
class(aov_hack) <- c("aov", "lm")  ## drop "maov" and "mlm"
## the following elements are mandatory for `TukeyHSD`
## names(aov_hack)
#[1] "residuals"   "effects"     "rank"        "assign"      "qr"         
#[6] "df.residual" "call"        "terms"       "model" 

N <- length(response_names)  ## number of response variables
result <- vector("list", N)
for (i in 1:N) {
  ## change response variable in the formula
  aov_hack$call[[2]][[2]] <- as.name(response_names[i])
  ## change residuals
  aov_hack$residuals <- fit$residuals[, i]
  ## change effects
  aov_hack$effects <- fit$effects[, i]
  ## change "terms" object and attribute
  old_tm <- terms(fit)  ## old "terms" object in the model
  old_tm[[2]] <- as.name(response_names[i])  ## change response name in terms
  new_tm <- terms.formula(formula(old_tm))  ## new "terms" object
  aov_hack$terms <- new_tm  ## replace `aov_hack$terms`
  ## replace data in the model frame
  aov_hack$model[1] <- data.frame(fit$model[[1]][, i])
  names(aov_hack$model)[1] <- response_names[i]
  ## run `TukeyHSD` on `aov_hack`
  result[[i]] <- TukeyHSD(aov_hack)
  }

result[[1]]  ## for example
#  Tukey multiple comparisons of means
#    95% family-wise confidence level
#
#Fit: aov(formula = V1 ~ group, data = dat)
#
#$group
#            diff        lwr        upr     p adj
#b-a -0.012743870 -0.1043869 0.07889915 0.9425847
#c-a -0.022470482 -0.1141135 0.06917254 0.8322109
#c-b -0.009726611 -0.1013696 0.08191641 0.9661356

我使用了 "for" 循环。如果需要，请将其替换为 lapply。

使用 R 的快速 post hoc 计算

Fast post hoc computation using R

r

processing-efficiency

posthoc