R 中调查包中的 svyglm 函数出错:"all variables must be in design=argument"

Error with svyglm function in survey package in R: "all variables must be in design=argument"

Whosebug 的新手。我正在处理一个使用 NHIS 数据的项目,但我无法让 svyglm 函数工作,即使是一个简单的、未经调整的逻辑回归,带有二元预测变量和二元结果变量(最终我想使用多个分类预测变量,但一个一步一步)。

El_under_glm<-svyglm(ElUnder~SO2, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)

eval(extras, data, env) 错误: 未找到对象“.survey.prob.weights”

我将变量改为 0 和 1:

Under_narm$SO2REG<-ifelse(Under_narm$SO2=="Heterosexual", 0, 1) Under_narm$ElUnderREG<-ifelse(Under_narm$ElUnder=="No", 0, 1)

但是又遇到了一个不同的问题:

El_under_glm<-svyglm(ElUnderREG~SO2REG, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)

错误 svyglm.survey.design(ElUnderREG ~ SO2REG, design = SAMPdesign, : 所有变量必须在 design= argument

这是我用来计算权重的设计 -- 我很确定它是正确的:

SAMPdesign=svydesign(data=Under_narm, id= ~NHISPID, weight= ~SAMPWEIGHT)

感谢任何和所有帮助!我很好地掌握了统计数据,但我的编码速度很慢。让我知道是否可以提供任何其他信息。

使用一些虚构的示例数据,我能够通过设置 rescale = TRUE 使您的模型达到 运行。文档说明

Rescaling of weights, to improve numerical stability. The default rescales weights to sum to the sample size. Use FALSE to not rescale weights.

因此,一种解决方案可能就是设置 rescale = TRUE.

library(survey)
  # sample data
  Under_narm <- data.frame(SO2 = factor(rep(1:2, 1000)),
                           ElUnder = sample(0:1, 1000, replace = TRUE),
                           NHISPID = paste0("id", 1:1000),
                           SAMPWEIGHT = sample(c(0.5, 2), 1000, replace = TRUE))
                           
  # with 'rescale' = TRUE
  SAMPdesign=svydesign(ids = ~NHISPID,
                       data=Under_narm,
                       weights = ~SAMPWEIGHT)
 
  El_under_glm<-svyglm(formula = ElUnder~SO2, 
                       design=SAMPdesign,
                       family=quasibinomial(), # this family avoids warnings
                       rescale=TRUE) # Weights rescaled to the sum of the sample size.
  
  summary(El_under_glm, correlation = TRUE) # use correlation with summary()
  

否则,用'survey:::svyglm.survey.design'查找此函数方法的代码,似乎可能存在错误。我可能是错的,但根据我的阅读,当 'rescale' 为 FALSE 时,.survey.prob.weights 似乎没有被赋值。

    if (is.null(g$weights)) 
      g$weights <- quote(.survey.prob.weights)
    else g$weights <- bquote(.survey.prob.weights * .(g$weights)) # bug?
    g$data <- quote(data)
    g[[1]] <- quote(glm)
    if (rescale) 
      data$.survey.prob.weights <- (1/design$prob)/mean(1/design$prob)

如果您在全局环境中将数值向量分配给 .survey.prob.weights,可能会有解决方法。不知道这些值应该是什么,但是如果您执行以下操作,您的错误就会消失。 (.survey.prob.weights 需要是数据长度的两倍。)

SAMPdesign=svydesign(ids = ~NHISPID,
                     data=Under_narm,
                     weights = ~SAMPWEIGHT)

.survey.prob.weights <- rep(1, 2000)

El_under_glm<-svyglm(formula = ElUnder~SO2, 
                     design=SAMPdesign,
                     family=quasibinomial(), 
                     rescale=FALSE)

summary(El_under_glm, correlation = TRUE)