如何将乘法插补数据与鼠标结合起来?
How to combine multiply imputed data with mice?
我将一个数据集分为男性和女性,然后使用 mice
包分别估算它。
#Generate predictormatrix
pred_gender_0<-quickpred(data_gender_0, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
pred_gender_1<-quickpred(data_gender_1, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
#impute the data with mice
imp_pred_gen0 <- mice(data_gender_0,
pred=pred_gender_0,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000) #i had to set this to 3000 because of an problematic unordered categorical variable
imp_pred_gen1 <- mice(data_gender_1,
pred=pred_gender_1,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
现在,我有两个包含 10 个估算数据集的对象。一款适合男士,一款适合女士。
我的问题是,如何将它们结合起来?
通常,我会使用:
comp_imp<-complete(imp,"long")
我应该:
- 用
rbind.mids()
合并男女数据然后转换成长格式?
- 我是先转换为长格式,然后使用
rbind.mids()
还是 rbind()
?
感谢任何提示! =)
-------------------------------------------- ------------------------------
更新 - 可重现的例子
library("dplyr")
library("mice")
# We use nhanes-dataset from the mice-package as example
# first: combine age-category 2 and 3 to get two groups (as example)
nhanes$age[nhanes$age == 3] <- "2"
nhanes$age<-as.numeric(nhanes$age)
nhanes$hyp<-as.factor(nhanes$hyp)
#split data into two groups
nhanes_age_1<-nhanes %>% filter(age==1)
nhanes_age_2<-nhanes %>% filter(age==2)
#generate predictormatrix
pred1<-quickpred(nhanes_age_1, mincor=0.1, inc=c('age','bmi'), exc='chl')
pred2<-quickpred(nhanes_age_2, mincor=0.1, inc=c('age','bmi'), exc='chl')
# seperately impute data
set.seed(121012)
imp_gen1 <- mice(nhanes_age_1,
pred=pred1,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
imp_gen2 <- mice(nhanes_age_2,
pred=pred2,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
#------ ALTERNATIVE 1:
#combine imputed data
combined_imp<-rbind.mids(imp_gen1,imp_gen2)
complete_imp<-complete(combined_imp,"long")
#output
> combined_imp<-rbind.mids(imp_gen1,imp_gen2)
Warning messages:
1: In rbind.mids(imp_gen1, imp_gen2) :
Predictormatrix is not equal in x and y; y$predictorMatrix is ignored
.
2: In x$visitSequence == y$visitSequence :
longer object length is not a multiple of shorter object length
3: In rbind.mids(imp_gen1, imp_gen2) :
Visitsequence is not equal in x and y; y$visitSequence is ignored
.
> complete_imp<-complete(combined_imp,"long")
Error in inherits(x, "mids") : object 'combined_imp' not found
#------ ALTERNATIVE 2:
complete_imp1<-complete(imp_gen1,"long")
complete_imp2<-complete(imp_gen2,"long")
combined_imp<-rbind.mids(complete_imp1,complete_imp2)
#Output
> complete_imp1<-complete(imp_gen1,"long")
> complete_imp2<-complete(imp_gen2,"long")
> combined_imp<-rbind.mids(complete_imp1,complete_imp2)
Error in if (ncol(y) != ncol(x$data)) stop("The two datasets do not have the same number of columns\n") :
argument is of length zero
老实说,我对这个包一无所知mice
,对插补的概念也只有一个模糊的概念。
我不知道你想进行什么样的分析,但你说通常你会做:comp_imp<-complete(imp,"long")
,所以我会尝试相应地回答。
对我来说,第一种方法 returns a data.frame,但没有任何遗漏。这很奇怪,因为 complete(imp_gen1,"long")
中缺少 hyp
中的数据。我不知道 rbind.mids
在那里做什么。
因此我会采用您的第二种方法。
complete(., "long")
的结果是正常的data.frame,因此不需要绑定rbind.mids
。
我会将您的第二种方法更改为:
library(dplyr)
complete_imp1 <- complete(imp_gen1, "long")
complete_imp2 <- complete(imp_gen2, "long")
combined_imp <- bind_rows(complete_imp1, complete_imp2)
complete_imp1 <- complete(imp_gen1, "long")
已经 returns 10(m
参数)估算的数据帧,只需计算 complete_imp1
的总行数并乘以 m
您可以使用以下内容创建一个新的 mids 对象,其中包含 10 个男性和女性的估算数据集。
comp_imp <- rbind(pred_gender_0, pred_gender_1)
这样做会调用 rbind.mids,而不是 R 中的常规绑定函数。返回的新对象可以用通常的方式进行分析,例如使用 with.mids 将您想要的模型拟合到每个估算的数据集。
我将一个数据集分为男性和女性,然后使用 mice
包分别估算它。
#Generate predictormatrix
pred_gender_0<-quickpred(data_gender_0, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
pred_gender_1<-quickpred(data_gender_1, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
#impute the data with mice
imp_pred_gen0 <- mice(data_gender_0,
pred=pred_gender_0,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000) #i had to set this to 3000 because of an problematic unordered categorical variable
imp_pred_gen1 <- mice(data_gender_1,
pred=pred_gender_1,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
现在,我有两个包含 10 个估算数据集的对象。一款适合男士,一款适合女士。 我的问题是,如何将它们结合起来? 通常,我会使用:
comp_imp<-complete(imp,"long")
我应该:
- 用
rbind.mids()
合并男女数据然后转换成长格式? - 我是先转换为长格式,然后使用
rbind.mids()
还是rbind()
?
感谢任何提示! =)
-------------------------------------------- ------------------------------
更新 - 可重现的例子
library("dplyr")
library("mice")
# We use nhanes-dataset from the mice-package as example
# first: combine age-category 2 and 3 to get two groups (as example)
nhanes$age[nhanes$age == 3] <- "2"
nhanes$age<-as.numeric(nhanes$age)
nhanes$hyp<-as.factor(nhanes$hyp)
#split data into two groups
nhanes_age_1<-nhanes %>% filter(age==1)
nhanes_age_2<-nhanes %>% filter(age==2)
#generate predictormatrix
pred1<-quickpred(nhanes_age_1, mincor=0.1, inc=c('age','bmi'), exc='chl')
pred2<-quickpred(nhanes_age_2, mincor=0.1, inc=c('age','bmi'), exc='chl')
# seperately impute data
set.seed(121012)
imp_gen1 <- mice(nhanes_age_1,
pred=pred1,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
imp_gen2 <- mice(nhanes_age_2,
pred=pred2,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
#------ ALTERNATIVE 1:
#combine imputed data
combined_imp<-rbind.mids(imp_gen1,imp_gen2)
complete_imp<-complete(combined_imp,"long")
#output
> combined_imp<-rbind.mids(imp_gen1,imp_gen2)
Warning messages:
1: In rbind.mids(imp_gen1, imp_gen2) :
Predictormatrix is not equal in x and y; y$predictorMatrix is ignored
.
2: In x$visitSequence == y$visitSequence :
longer object length is not a multiple of shorter object length
3: In rbind.mids(imp_gen1, imp_gen2) :
Visitsequence is not equal in x and y; y$visitSequence is ignored
.
> complete_imp<-complete(combined_imp,"long")
Error in inherits(x, "mids") : object 'combined_imp' not found
#------ ALTERNATIVE 2:
complete_imp1<-complete(imp_gen1,"long")
complete_imp2<-complete(imp_gen2,"long")
combined_imp<-rbind.mids(complete_imp1,complete_imp2)
#Output
> complete_imp1<-complete(imp_gen1,"long")
> complete_imp2<-complete(imp_gen2,"long")
> combined_imp<-rbind.mids(complete_imp1,complete_imp2)
Error in if (ncol(y) != ncol(x$data)) stop("The two datasets do not have the same number of columns\n") :
argument is of length zero
老实说,我对这个包一无所知mice
,对插补的概念也只有一个模糊的概念。
我不知道你想进行什么样的分析,但你说通常你会做:comp_imp<-complete(imp,"long")
,所以我会尝试相应地回答。
对我来说,第一种方法 returns a data.frame,但没有任何遗漏。这很奇怪,因为 complete(imp_gen1,"long")
中缺少 hyp
中的数据。我不知道 rbind.mids
在那里做什么。
因此我会采用您的第二种方法。
complete(., "long")
的结果是正常的data.frame,因此不需要绑定rbind.mids
。
我会将您的第二种方法更改为:
library(dplyr)
complete_imp1 <- complete(imp_gen1, "long")
complete_imp2 <- complete(imp_gen2, "long")
combined_imp <- bind_rows(complete_imp1, complete_imp2)
complete_imp1 <- complete(imp_gen1, "long")
已经 returns 10(m
参数)估算的数据帧,只需计算 complete_imp1
的总行数并乘以 m
您可以使用以下内容创建一个新的 mids 对象,其中包含 10 个男性和女性的估算数据集。
comp_imp <- rbind(pred_gender_0, pred_gender_1)
这样做会调用 rbind.mids,而不是 R 中的常规绑定函数。返回的新对象可以用通常的方式进行分析,例如使用 with.mids 将您想要的模型拟合到每个估算的数据集。