NA 使用 group_by 或聚合函数 [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) 时出错:没有要聚合的行]

Error for NA using group_by or aggregate function [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate]

我最近学习了 R 编程,并一直在浏览此处发布的一些 group_by/aggregate 问题,以帮助我更好地学习。今天早些时候我想到了一个问题,关于 group_by/aggregate 如何合并 NA 数据而不是 0.

鉴于 table 和下面的代码(感谢 max_lim 允许我使用他的数据集),如果 NA 字段存在(这经常发生)会发生什么?

Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,NA,16,12,8,NA,31,NA,3,20,39,34,27,50,NA,NA,NA,NA)
Duck = c(12,12,6,NA,NA,NA,28,13,31,50,33,20,NA,9,19,2,NA,7)
Chicken = c(100,120,80,50,NA,10,27,31,NA,43,NA,28,37,NA,NA,NA,5,43)
Sheep = c(30,20,10,NA,16,13,10,20,20,17,48,12,30,NA,20,NA,27,49)
Horse = c(25,20,16,11,NA,12,14,NA,43,42,10,12,42,NA,16,7,NA,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
Farm Year Cow Duck Chicken Sheep Horse
Farm 1 2020 22 12 100 30 25
Farm 1 2020 NA 12 120 20 20
Farm 1 2019 16 6 80 10 16
Farm 1 2019 12 NA 50 NA 11
Farm 1 2018 8 NA NA 16 NA
Farm 1 2018 NA NA 10 13 12
Farm 2 2020 31 28 27 10 14
Farm 2 2020 NA 13 31 20 NA
Farm 2 2019 3 31 NA 20 43
Farm 2 2019 20 50 43 17 42
Farm 2 2018 39 33 NA 48 10
Farm 2 2018 34 20 28 12 12
Farm 3 2020 27 NA 37 30 42
Farm 3 2020 50 9 NA NA NA
Farm 3 2019 NA 19 NA 20 16
Farm 3 2019 NA 2 NA NA 7
Farm 3 2018 NA NA 5 27 NA
Farm 3 2018 NA 7 43 49 42

如果我在这里使用 aggregate(.~Farms + Year, Data, mean),我会在 aggregate.data.frame(lhs, mf[- 1L], FUN = FUN, ...) :没有要聚合的行 我认为这是因为 mean 函数无法解释 NA.

有谁知道我们如何修改 aggregate/group_by 函数以通过仅使用没有 NA 数据的年份计算平均值来说明 NA? IE。 2020: 10, 2019: NA, 2018:20, 2017:NA, 2016:15 -> 平均值(扣除 NA 年 2019 和 2017 后)将为 (10 + 20 + 15) / (3 ) = 15.

理想的输出如下:

Farm Year Cow Duck Chicken Sheep Horse
Farm 1 2020 22 (avg = 22/1 as one entry is NA) 12 110 25 22.5
Farm 1 2019 14 6 65 10 13.5
Farm 1 2018 8 N.A. (as it's all NA) 10 14.5 12
Farm 2 2020 31 20.5 29 15 14
Farm 2 2019 11.5 40.5 43 18.5 42.5
Farm 2 2018 36.5 26.5 28 30 11
Farm 3 2020 ... ... ... ... ...
Farm 3 2019 ... ... ... ... ...
Farm 3 2018 ... ... ... ... ...

这里是创建想要的 data.frame 的方法。我认为您的解决方案在第 2 行(绵羊)中有一个错误,其中 mean(NA, 10) 等于 5 而不是 10。

library(dplyr)

使用聚合

 Data %>% 
  aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>% 
  arrange(Farms, desc(Year)) %>% 
  as.data.frame() %>%  
  mutate_at(names(.), ~replace(., is.nan(.), NA))

使用总结

Data %>% 
  group_by(Year, Farms) %>% 
  summarize(MeanCow = mean(Cow, na.rm=T),
            MeanDuck =  mean(Duck, na.rm=T),
            MeanChicken = mean(Chicken, na.rm=T),
            MeanSheep = mean(Sheep, na.rm=T),
            MeanHorse = mean(Horse, na.rm=T)) %>% 
  arrange(Farms, desc(Year)) %>% 
  as.data.frame() %>% 
  mutate_at(names(.), ~replace(., is.nan(.), NA))

两者的解决方案

      Year  Farms  Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0     110  25.0  22.5
2 2019 Farm 1 14.0  6.0      65  10.0  13.5
3 2018 Farm 1  8.0   NA      10  14.5  12.0
4 2020 Farm 2 31.0 20.5      29  15.0  14.0
5 2019 Farm 2 11.5 40.5      43  18.5  42.5
6 2018 Farm 2 36.5 26.5      28  30.0  11.0
7 2020 Farm 3 38.5  9.0      37  30.0  42.0
8 2019 Farm 3   NA 10.5      NA  20.0  11.5
9 2018 Farm 3   NA  7.0      24  38.0  42.0