R:如何遍历模型,一次丢弃一个观察值?

R: How to loop through models dropping one observation at a time?

我无法循环遍历回归模型,每次丢弃一个观察值来估计有影响的观察值的影响。

我想 运行 模型几次,每次删除第 i 个观察值并提取相关系数估计并将其存储在向量中。我认为这可以通过相当直接的循环很容易地完成,但是,我坚持细节。

我想留下一个向量,其中包含来自同一模型的 n 次迭代的 n 个系数估计值。任何帮助都是有益的!

下面我提供了一些虚拟数据和示例代码。

#Dummy data:

set.seed(489)

patientn <- rep(1:400)

gender <- rbinom(400, 1, 0.5)

productid <- rep(c("Product A","Product B"), times=200)

country <- rep(c("USA","UK","Canada","Mexico"), each=50)

baselarea <- rnorm(400,400,60) #baseline area
baselarea2 <- rnorm(400,400,65) #baseline area2

sfactor  <- c(
  rep(c(0.3,0.9), times = 25),
  rep(c(0.4,0.5), times = 25),
  rep(c(0.2,0.4), times = 25),
  rep(c(0.3,0.7), times = 25)
)

rashdummy2a <- data.frame(patientn,gender,productid,country,baselarea,baselarea2,sfactor)

Data <- rashdummy2a %>% mutate(rashleft = baselarea2*sfactor/baselarea*100) ```


## Example of how this can be done manually: 

# model
m1<-lm(rashleft ~ gender + baselarea + sfactor, data = data)

# extracting relevant coefficient estimates, each time dropping a different "patient" ("patientn")

betas <- c(lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=1)$coefficients[2],
           lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=2)$coefficients[2],
           lm(rashleft ~ gender + baselarea + sfactor, data = rashdummy2b, patientn !=3)$coefficients[2])

# the betas vector now stores the relevant coefficient estimates (coefficient nr 2, for gender) for three different variations of the model.  

我们可以使用for循环。在您的问题中,您使用了一个未定义的对象 rashdummy2b 。现在我使用了 data,但您可以将其替换为选择的对象。

#create list to bind results to
result <- list()

#loop through patients and extract betas
for(i in unique(data$patientn)){

  #construct linear model
  lm.model <- lm(rashleft ~ gender + baselarea + sfactor, data = subset(data, data$patientn != i))
  
  #create data.frame containing patient left out and coefficient
  result.dt <- data.frame(beta = lm.model$coefficients[[2]],
                          patient_left_out = i)
  
  #bind to list
  result[[i]] <- result.dt
}

#bind to data.frame
result <- do.call(rbind, result)

结果

head(result) 
      beta patient_left_out
1 1.381248                1
2 1.345188                2
3 1.427784                3
4 1.361674                4
5 1.420417                5
6 1.454196                6

您可以使用负索引 删除特定行(或列)。对于您的情况,您可以按如下方式进行:

betas <- numeric(nrow(rashdummy2b))  # memory preallocation
for (i in 1:nrow(rashdummy2b)) {
  betas[i] <- lm(rashleft ~ gender + baselarea + sfactor, data=rashdummy2b[-i,])$coefficients[2]
}