broom::augment 省略数据中的列

broom::augment omits columns from data

broom::augment 仅输出公式中使用的数据列。这是有问题的行为,因为有时能够找到诸如受访者 ID 之类的东西会非常有帮助。使用 newdata 参数可能是一种解决方法,但它在处理嵌套数据时仍然没有提供修复。

在线附加说明:

#simulated glm data
glmdata = data.frame(ID=1:100, A=rnorm(100), B=rnorm(100)) %>% mutate(response=rbinom(length(ID),1,1/(1+exp(-2*A-3*B))  ))

#fit model, not including the ID variable
glmfit = glm(response~A+B, glmdata,family='binomial')

#ID variable is contained in glm$data
str(glmfit$data)

#works!
head(glmfit$data$ID)


#use broom::augment
augmented = glmfit %>% augment

#does not work, wth broom?!
augmented$ID


#ok ... I could use the newdata argument
augmented = glmfit %>% augment(newdata=glmdata)
augmented$ID


#however, that is a hacky workaround ....

#... and it does not fix the following scenario:

#Let's say I want to use nest


#simulated glm data
glmdata1 = data.frame(segm=1,ID=1:100, A=rnorm(100), B=rnorm(100)) %>% mutate(response=rbinom(length(ID),1,1/(1+exp(-2*A-3*B))  ))
glmdata2 = data.frame(segm=2,ID=1:100, A=rnorm(100), B=rnorm(100)) %>% mutate(response=rbinom(length(ID),1,1/(1+exp(-3*A-2*B))  ))

glmdata_nest = rbind(glmdata1,glmdata2) %>% group_by(segm) %>% nest


#fit the two models via map
glmfit_nest= glmdata_nest %>% mutate(model=map(data, glm, formula=response~A+B, family='binomial') )

#run augment via map
glmfit_nest_augmented = glmfit_nest %>% mutate(augmented = map(model,augment))

#ID is not here ...
glmfit_nest_augmented$augmented$ID


#ok, so then we have to use map2 ....
glmfit_nest_augmented = glmfit_nest %>% mutate(augmented = map2(model,data,augment,newdata=.y))

#but even this doesn't work

#also, trying to recycling glm$data does not work
glmfit_nest_augmented = glmfit_nest %>% mutate(augmented = map(model,augment,newdata=.$data))

更新: 扫帚开发人员故意选择这种不一致的行为 https://github.com/tidymodels/broom/issues/753

此处 .x.y~

的匿名函数调用一起使用
glmfit_nest_augmented <-  glmfit_nest %>% 
         mutate(augmented = map2(model,data,~ augment(.x, newdata=.y))