当我 运行 mice::mice 估算值时,我想按 state_idspecies 对我的数据进行分组。我已经按 state_id 对它进行了分组,结果看起来比没有按组要好得多。

mice.impute.bygroup: Groupwise Imputation Function


# Modify df name and method
init <- mice::mice(data, method = "pmm", maxit = 0) 
meth <- init$meth
pred <- init$pred

# Impute variables by group (state_id)
imputationFunction <- list("decimalLatitude" = meth["decimalLatitude"],
                           "decimalLongitude" = meth["decimalLongitude"])

meth[c("decimalLatitude", "decimalLongitude")] <- "bygroup"

group <- list("decimalLatitude" = "state_id", 
              "decimalLongitude" = "state_id")

# Remove variables as predictors but they can still be imputed.
pred[, c("coordinateUncertaintyInMeters", "geoprivacy_id")] <- 0

imp <- mice::mice(data, meth = meth, pred = pred, m = 1, 
                  group = group, imputationFunction = imputationFunction)
imp <- complete(imp)


imp <- mice(data, m = 1, maxit = 3, method = 'norm.predict', seed = 500)
imp <- complete(imp, 1)


  1. 我可以按多个变量分组吗?

当我用 species_id 替换变量 state_id 时,我 运行 遇到错误:

错误 lm.fit(x = x, y = y) : 0 (non-NA) cases


group <- list("decimalLatitude" = "species_id", 
              "decimalLongitude" = "species_id")

你不应该直接使用 mice.impute.bygroup。它是一个在您指定 method["x"] <- "bygroup" 时被调用的函数,就像您用 "norm.predict" 调用 mice.impute.norm.predict 一样(参见 ?mice.impute.norm.predict)。

下面是一些关于如何使用 bygroup.



data <- iris
# 'data.frame': 150 obs. of  5 variables:
#  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

data[, -5] <- mice::ampute(data[, -5])$amp

init <- mice::mice(data, maxit = 0)

按组(物种)估算一个变量 (Petal.Width)

meth <- init$meth
pred <- init$pred

imputationFunction <- list("Petal.Width" = meth["Petal.Width"])
meth["Petal.Width"] <- "bygroup"
group <- list("Petal.Width" = "Species")

pred[, "Species"] <- 0

imp <- mice::mice(data, meth = meth, pred = pred, m = 1, 
                  group = group, imputationFunction = imputationFunction)


meth <- init$meth
pred <- init$pred

imputationFunction <- as.list(meth[meth != ""])
meth[meth != ""] <- "bygroup"
group <- imputationFunction
group[] <- "Species"

pred[, "Species"] <- 0

imp <- mice::mice(data, meth = meth, pred = pred, m = 1, 
                  group = group, imputationFunction = imputationFunction)


bygroup 方法不允许您对多个变量进行分组。您可以创建一个简单地涵盖所有这些组的新变量。在内部,bygroup 所做的只是将数据分成不同的组,所以这不是问题。

然而,在某些时候你必须考虑这是否是一种正确的做事方式。可能更值得考虑 multi-level imputation.