For循环不将结果存储在专用数据框列中

Question

我创建了下面的 for 循环来处理面板数据的预测过程。虽然数据上的每个过程都非常有效，但预测的存储（最后一步）是不成功的。由于我的无能（对 for 循环相当陌生），for 循环没有用数字预测替换为存储而创建的数据框列中的 NA。我做错了什么？

共有 17 个县，每个县有 61 个观测值。因此，对于每个，我得到 61 个预测。

使用的数据集：https://www.dropbox.com/scl/fi/v2xk34ac58h2kk7uxunat/dt1.xlsx?dl=0&rlkey=gf2e15z4gtuu83lxalzn91rai

#Data prep for modeling and predictions
mydata$...1 <- NULL #remove useless column

mydata$month_year <- as.factor(mydata$month_year) #time fixed-effects

mydata$ncve_relax_lag <- as.numeric(mydata$ncve_relax_lag) #make numeric
mydata$ncve_strict_lag <- as.numeric(mydata$ncve_strict_lag)

mydata <- mydata %>% drop_na()

mydata$population <- mydata$population/10000 #scaling

mydata$area <- mydata$area/10000 #scaling
    
mydata$no_troops <- mydata$no_troops/1000 #scaling

#Create data frame columns to store predictions
mydata$nbpred.core <- NA
mydata$nbpred.lit <- NA
mydata$nbpred.base <- NA

#Model fitting and predictions
runPredictions <- function(){
  for(i in unique(mydata$prefecture)){
    print(i)
    
    #Define training and test sets
    sptllearningSet <- mydata[mydata$prefecture != i,]
    sptltestSet <- mydata[mydata$prefecture == i,]
    
    #Train model
    sptlnb_base <- glm.nb(ncve_relax ~ population + 
                           capdist +
                           month_year,
                           data = sptllearningSet,
                           control = glm.control(maxit = 3000))
    
    
    sptlnb_lit <- glm.nb(ncve_relax ~ population + 
                          capdist + 
                          multidim.poverty +
                          eth_frc_t13 +
                          eth_plr_t13 +
                          sp_lag_relax +
                          ncve_relax_lag +
                          month_year,
                          data = sptllearningSet,
                          control = glm.control(maxit = 3000))
    
    
    sptlnb_core <- glm.nb(ncve_relax ~ population + 
                           capdist + 
                           multidim.poverty +
                           eth_frc_t13 +
                           eth_plr_t13 +
                           sp_lag_relax +
                           ncve_relax_lag +
                           no_troops +
                           unpol.dummy +
                           area +
                           ruggedness +
                           month_year, 
                           data = sptllearningSet,
                           control = glm.control(maxit = 3000))
    
    #Use coefficients to predict on test
    mydata$nbpred.core[mydata$prefecture == i] = as.numeric(predict(sptlnb_core, newdata = mydata[mydata$prefecture == i,], type='response'))
    mydata$nbpred.lit[mydata$prefecture == i] = as.numeric(predict(sptlnb_lit, newdata = mydata[mydata$prefecture == i,], type='response'))
    mydata$nbpred.base[mydata$prefecture == i] = as.numeric(predict(sptlnb_base, newdata = mydata[mydata$prefecture == i,], type='response'))
  }
}

感谢您的帮助！

编辑：我添加了代码的初始部分以确保它完全可重现。

Answer 1

您正在处理一个范围问题，当您在函数内运行您的 for 循环时，预测将在函数内分配，但不会影响可访问数据框的全局环境。

处理此问题的最直接方法是将 for 循环从函数中拉出 - 删除调用 runPredictions <- function(){}，它应该可以正常工作。或者，您可以强制该函数分配给全球环境，或跨州应用各个函数（例如使用 pmap）

For循环不将结果存储在专用数据框列中

For loop not storing the results in for-purpose data frame columns

for-loop

r

predict

dataframe