如何将均值归因于列中的特定观察结果？

Question

我目前有一项作业包括 table 数据，其中包括关于在不同场合测量的动物物种的观察信息。在我的数据的 'weight' 列中，有一些缺失值，我应该用动物来源的物种的平均体重替换这些值。因此，在未记录动物体重的两种情况下，我希望物种 "albigula" 的平均体重为 148 来代替 NA，这样我就有了完整的数据集。然后我需要对另外 10 个左右的物种重复这个过程。

我想不出除了以下方法之外的其他方法：

    albigula <- filter(surveys_combined_year, surveys_combined_year$species == "albigula")
    albigula$weight %>% mean(na.rm= TRUE)

但是，这显然行不通，因为我无法将平均值归因于它在 "surveys_combined_year$weight" 中的特定位置。

很抱歉可能是超级初学者的问题，我已经搜索了我们在 class 中提供的所有资源，但我似乎仍然无法理解我遗漏了什么。

请帮帮我！

Answer 1

我们可以做一个group_byreplace。按 'species'、replace 'weight' 中的 NA (replace_na) 元素按 'weight'[=19= 的 mean 分组]

library(dplyr)
library(tidyr)
out <- surveys_combined_year %>%
         group_by(species) %>%
         mutate(weight = replace_na(weight, mean(weight, na.rm = TRUE)))

编辑 - 将 replace 更改为 replace_na（@BenBolker 的评论）

如何将均值归因于列中的特定观察结果？

How to impute means into specific observations in a column?

r

na

imputation