对 R 中的多个数据子集执行线性混合模型

Perform linear mixed model on several subsets of data in R

考虑 lme4 包中的 sleepstudy 数据,如下所示。包含 18 名受试者,他们在不同的日子重复测量反应。假设多了一个变量Days2如下:

library("lme4")
sleepstudy$Days2<- rep(10:18, rep(20,9)) 
sleepstudy
  Reaction Days Subject Age Days2
1 249.5600    0     308  20    10
2 258.7047    1     308  20    10
3 250.8006    2     308  20    10
...
178 343.2199    7     372  28    18
179 369.1417    8     372  28    18
180 364.1236    9     372  28    18

此数据有 9 个不同的 Days2,即 10,11,...,18。做9个数据子集如下:

tt <- sort(unique(sleepstudy$Days2))
Subset1 <- sleepstudy[sleepstudy$Days2>=tt[1], ]
Subset2<-sleepstudy[sleepstudy$Days2>=tt[2], ]
...
Subset9<-sleepstudy[sleepstudy$Days2>=tt[length(tt)], ] 

对 9 个数据子集执行随机截距的单独线性混合模型,然后在 Days=tt[i] 执行预测,如下所示:

fit1 = lmer(Reaction ~ Days + (1 | Subject), data = Subset1)
newSubset1 <- data.frame(   Days = tt[1], Subject = unique(Subset1$Subject))
newSubset1$Predicted_Response <- predict(fit1, newdata = newSubset1)

fit2 = lmer(Reaction ~ Days + (1 | Subject), data = Subset2)
newSubset2 <- data.frame(   Days = tt[2], Subject = unique(Subset2$Subject))
newSubset2$Predicted_Response <- predict(fit2, newdata = newSubset2)
...
fit9 = lmer(Reaction ~ Days + (1 | Subject), data = Subset9)
newSubset9 <- data.frame(  Days = tt[9], Subject = unique(Subset9$Subject) )
newSubset9$Predicted_Response <- predict(fit9, newdata = newSubset9)

将输出合并到一个数据集的最后一步

FinalOutput<-rbind( newSubset1,newSubset2,...,newSubset9 )
FinalOutput
  Days Subject Predicted_Response
   10     308           396.8617
   10     309           278.2284
   10     310           292.9694
   ...
   11     372           383.3525
   ...
   18     371           434.8685
   18     372           454.5697

以上步骤是手动的。我如何使用步骤的概括在 R 中获得最终输出?可能是这样的:

for(i in length(tt)){
Subset[i]<-sleepstudy[sleepstudy$Days2>=tt[i], ] 

fit[i]= lmer(Reaction ~ Days + (1 | Subject), data = Subset[i])
newSubset[i] <- data.frame(  Days = tt[i], Subject = unique(Subset[i]$Subject) )
newSubset[i]$Predicted_Response <- predict(fit[i], newdata = newSubset[i])
...

purrr::map_dfr() (一) 采用特定于子集的函数,(b) 将函数应用于每个子集,并且 (c) 将所有新子集组合成一个 data.frame

library("lme4")
ds_sample <-
  lme4::sleepstudy |> 
  dplyr::mutate(
    Days2 = rep(10:18, rep(20,9)) 
  )

predict_reaction <- function (.days_2) {
  ds_subset <- ds_sample[.days_2 <= ds_sample$Days2, ]

  fit = lmer(Reaction ~ Days + (1 | Subject), data = ds_subset)
  newSubset1 <- data.frame(   Days = .days_2, Subject = unique(ds_subset$Subject))
  newSubset1$Predicted_Response <- predict(fit, newdata = newSubset1)
  newSubset1
}

sort(unique(ds_sample$Days2)) |> 
  purrr::map_dfr(predict_reaction)

输出:

   Days Subject Predicted_Response
1    10     308           396.8617
2    10     309           278.2284
3    10     310           292.9694
4    10     330           360.4844
5    10     331           366.2942
6    10     332           364.2992
7    10     333           372.5785
8    10     334           353.0810
9    10     335           310.7958
...
87   17     371           453.4433
88   17     372           468.7271
89   18     371           434.8685
90   18     372           454.5697