用 dplyr 按条件估算缺失值

impute missing value by condition with dplyr

我想用同性别的平均值替换缺失值。

例如,如果'patient A - male'在疼痛方面有缺失值,则缺失值将替换为男性疼痛的平均值。

rawdata <- rawdata %>%
  mutate(replace_pain = ifelse(is.na(pain) & sex == "male",
                               rawdata %>% 
                                 filter(sex == "male") %>% 
                                 mean(pain, na.rm = TRUE),
                               ifelse(is.na(pain) & sex == "female",
                                      rawdata %>% 
                                        filter(sex == "female") %>% 
                                        mean(pain, na.rm = TRUE),
                                      pain)))

它有两个问题。

1)编码有点乱

2) 它不起作用。出现错误。也许,%>%mean 代码似乎有问题。

Warning message:
In mean.default(., pain, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

是否有更好的方法来根据条件估算缺失值?

您的代码无法正常工作,因为您必须添加 summarise(mean(pain, na.rm = TRUE)) 而不仅仅是 mean(pain, na.rm = TRUE)。您不能在数据框上使用 mean

rawdata %>%
  mutate(replace_pain= ifelse(is.na(pain) & sex=="male",
                              rawdata %>% filter(sex=="male") %>% summarise(mean(pain,na.rm=TRUE)),
                              ifelse(is.na(pain) & sex=="female",
                                     rawdata %>% filter(sex=="female") %>% summarise(mean(pain,na.rm=TRUE)),
                                     pain)))

代码还是比较乱,先定义一个avg_pain_femaleavg_pain_male变量可能会更好