将模拟 merMods 的结果存储在数据框中

Question

已更新：

我正在尝试通过模拟已知数据和运行模型 100 次来检查 merMod 对象的参数估计值的可变性。我希望结果是一个如下所示的数据框：

| simulation | intercept | est.x1 | est.x2 |
| ---------- | --------- | ------ | ------ |
| sim_study1 |.09        |.75     |.25     |
| sim_study2 |.10        |.72     |.21     |
| sim_study3 |NA         |NA      |NA      |

我使用随机截距和 2 个预测变量生成多级数据的代码是：

# note. this code block runs as expected, and if I run a lmer() call 
# on a simulated data set I get values that one would expect. 

gen_fake <- function(i, j){
    school <- rep(1:j)  
    person <- rep(1:i) # students nested in schools

    # parameters
    mu_a_true <- 0.10 # real intercept 
    sigma_a_true <- 0.10 # varince of intercept
    sigma_y_true <- 0.40
    b1_true <- .75
    b2_true <- .25

    # random intercept for schools  
    a_true <- rnorm(j, mu_a_true, sigma_a_true)

    # random data for predictors
    x1 <- rnorm(i, 0, 1)
    x2 <- rnorm(i, 0, 1)

    # outcome 
    y <- rnorm(i, a_true[school] + b1_true*x1 + b2_true*x2, sigma_y_true)

    return (data.frame(y, person, school, x1, x2))
}

我正在尝试对模型进行 100 次模拟，同时每次都生成新数据。请注意，我正在尝试在循环内实现 tryCatch，因为对于更复杂的模型，模型可能无法正常终止，我希望 table 中返回的值对于参数为 NA。

我的代码如下：

# create an empty data frame with names of parameters (there's probably
# a slicker way to do this within the loop where I can match names from 
# the model call)
sim_results <- data.frame(matrix(nrow=100, ncol=3, 
                      dimnames=list(c(),
                      c("intercept",
                      "est.x1", "est.x2"))),
                      stringsAsFactors=F)

# load library for analysis
library(lme4)

# conduct 100 simulations of the model generating fake data for each run
sim_study <- function (i, j, n.sims){
for (sim in 1:n.sims){
    fake_dat <- gen_fake(i, j)
    tryCatch({
        lmer_sim <- lmer(y ~ x1 + x2 + (1|school), data = fake_dat)
    }, error = function(e){
        return(NA)
    }) #return previous value of fm if error
    estimates <- rbind(fixef(lmer_sim))
    }
   sim_results[sim,] <- estimates
}

# run the simulation study
sim_study (1000,5,100)

我遇到的问题是该函数只有 returns 1 行并且它没有填充我创建的空数据框：

  (Intercept)        x1        x2
 [1,]  0.09659339 0.7746392 0.2325391

我不确定这个问题。最后，如果您有任何关于如何使这项工作更快的反馈，我们也将不胜感激，因为我想了解有关该问题的更多信息。感谢您的帮助。

Answer 1

这可能有点夸张，但我认为您只是放错了循环括号？这对我有用：

sim_study <- function (i, j, n.sims){
  for (sim in 1:n.sims) {
    if (sim %% 10 == 0 ) cat(".\n")  ## print progress
    fake_dat <- gen_fake(i, j)
    tryCatch({
      lmer_sim <- lmer(y ~ x1 + x2 + (1|school), 
                       data = fake_dat)
    }, error = function(e){
      return(rep(NA,3))  ## return vector of correct length
    }) #return previous value of fm if error
    estimates <- rbind(fixef(lmer_sim))
    sim_results[sim,] <- estimates
  }
  return(sim_results)
}

再补充几点：

我不确定 tryCatch() 逻辑是否有效，因为我没有遇到任何错误（但我认为它应该被修改为 return 具有当前长度的对象，如上）
您可以替换您的一些 gen_fake（不是预测变量的生成，而是使用内置 ?simulate.merMod() 生成的响应，但我认为它实际上不会工作得更好（或更差）
显着加快速度会有点 work/hacky。如果只有预测变量发生变化，有一个 refit() 函数可以快速运行，但在这种情况下不成立。您可以使用指定的技巧 here ...

将模拟 merMods 的结果存储在数据框中

Storing results from simulated merMods in data frame

for-loop

r

lme4

mixed-models