NA 使用 group_by 或聚合函数 [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) 时出错:没有要聚合的行]
Error for NA using group_by or aggregate function [aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate]
我最近学习了 R 编程,并一直在浏览此处发布的一些 group_by/aggregate 问题,以帮助我更好地学习。今天早些时候我想到了一个问题,关于 group_by/aggregate 如何合并 NA 数据而不是 0.
鉴于 table 和下面的代码(感谢 max_lim 允许我使用他的数据集),如果 NA 字段存在(这经常发生)会发生什么?
Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,NA,16,12,8,NA,31,NA,3,20,39,34,27,50,NA,NA,NA,NA)
Duck = c(12,12,6,NA,NA,NA,28,13,31,50,33,20,NA,9,19,2,NA,7)
Chicken = c(100,120,80,50,NA,10,27,31,NA,43,NA,28,37,NA,NA,NA,5,43)
Sheep = c(30,20,10,NA,16,13,10,20,20,17,48,12,30,NA,20,NA,27,49)
Horse = c(25,20,16,11,NA,12,14,NA,43,42,10,12,42,NA,16,7,NA,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
22
12
100
30
25
Farm 1
2020
NA
12
120
20
20
Farm 1
2019
16
6
80
10
16
Farm 1
2019
12
NA
50
NA
11
Farm 1
2018
8
NA
NA
16
NA
Farm 1
2018
NA
NA
10
13
12
Farm 2
2020
31
28
27
10
14
Farm 2
2020
NA
13
31
20
NA
Farm 2
2019
3
31
NA
20
43
Farm 2
2019
20
50
43
17
42
Farm 2
2018
39
33
NA
48
10
Farm 2
2018
34
20
28
12
12
Farm 3
2020
27
NA
37
30
42
Farm 3
2020
50
9
NA
NA
NA
Farm 3
2019
NA
19
NA
20
16
Farm 3
2019
NA
2
NA
NA
7
Farm 3
2018
NA
NA
5
27
NA
Farm 3
2018
NA
7
43
49
42
如果我在这里使用 aggregate(.~Farms + Year, Data, mean),我会在 aggregate.data.frame(lhs, mf[- 1L], FUN = FUN, ...) :没有要聚合的行 我认为这是因为 mean 函数无法解释 NA.
有谁知道我们如何修改 aggregate/group_by 函数以通过仅使用没有 NA 数据的年份计算平均值来说明 NA? IE。
2020: 10, 2019: NA, 2018:20, 2017:NA, 2016:15 -> 平均值(扣除 NA 年 2019 和 2017 后)将为 (10 + 20 + 15) / (3 ) = 15.
理想的输出如下:
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
22 (avg = 22/1 as one entry is NA)
12
110
25
22.5
Farm 1
2019
14
6
65
10
13.5
Farm 1
2018
8
N.A. (as it's all NA)
10
14.5
12
Farm 2
2020
31
20.5
29
15
14
Farm 2
2019
11.5
40.5
43
18.5
42.5
Farm 2
2018
36.5
26.5
28
30
11
Farm 3
2020
...
...
...
...
...
Farm 3
2019
...
...
...
...
...
Farm 3
2018
...
...
...
...
...
这里是创建想要的 data.frame 的方法。我认为您的解决方案在第 2 行(绵羊)中有一个错误,其中 mean(NA, 10) 等于 5 而不是 10。
library(dplyr)
使用聚合
Data %>%
aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
使用总结
Data %>%
group_by(Year, Farms) %>%
summarize(MeanCow = mean(Cow, na.rm=T),
MeanDuck = mean(Duck, na.rm=T),
MeanChicken = mean(Chicken, na.rm=T),
MeanSheep = mean(Sheep, na.rm=T),
MeanHorse = mean(Horse, na.rm=T)) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
两者的解决方案
Year Farms Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0 110 25.0 22.5
2 2019 Farm 1 14.0 6.0 65 10.0 13.5
3 2018 Farm 1 8.0 NA 10 14.5 12.0
4 2020 Farm 2 31.0 20.5 29 15.0 14.0
5 2019 Farm 2 11.5 40.5 43 18.5 42.5
6 2018 Farm 2 36.5 26.5 28 30.0 11.0
7 2020 Farm 3 38.5 9.0 37 30.0 42.0
8 2019 Farm 3 NA 10.5 NA 20.0 11.5
9 2018 Farm 3 NA 7.0 24 38.0 42.0
我最近学习了 R 编程,并一直在浏览此处发布的一些 group_by/aggregate 问题,以帮助我更好地学习。今天早些时候我想到了一个问题,关于 group_by/aggregate 如何合并 NA 数据而不是 0.
鉴于 table 和下面的代码(感谢 max_lim 允许我使用他的数据集),如果 NA 字段存在(这经常发生)会发生什么?
Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,NA,16,12,8,NA,31,NA,3,20,39,34,27,50,NA,NA,NA,NA)
Duck = c(12,12,6,NA,NA,NA,28,13,31,50,33,20,NA,9,19,2,NA,7)
Chicken = c(100,120,80,50,NA,10,27,31,NA,43,NA,28,37,NA,NA,NA,5,43)
Sheep = c(30,20,10,NA,16,13,10,20,20,17,48,12,30,NA,20,NA,27,49)
Horse = c(25,20,16,11,NA,12,14,NA,43,42,10,12,42,NA,16,7,NA,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
Farm | Year | Cow | Duck | Chicken | Sheep | Horse |
---|---|---|---|---|---|---|
Farm 1 | 2020 | 22 | 12 | 100 | 30 | 25 |
Farm 1 | 2020 | NA | 12 | 120 | 20 | 20 |
Farm 1 | 2019 | 16 | 6 | 80 | 10 | 16 |
Farm 1 | 2019 | 12 | NA | 50 | NA | 11 |
Farm 1 | 2018 | 8 | NA | NA | 16 | NA |
Farm 1 | 2018 | NA | NA | 10 | 13 | 12 |
Farm 2 | 2020 | 31 | 28 | 27 | 10 | 14 |
Farm 2 | 2020 | NA | 13 | 31 | 20 | NA |
Farm 2 | 2019 | 3 | 31 | NA | 20 | 43 |
Farm 2 | 2019 | 20 | 50 | 43 | 17 | 42 |
Farm 2 | 2018 | 39 | 33 | NA | 48 | 10 |
Farm 2 | 2018 | 34 | 20 | 28 | 12 | 12 |
Farm 3 | 2020 | 27 | NA | 37 | 30 | 42 |
Farm 3 | 2020 | 50 | 9 | NA | NA | NA |
Farm 3 | 2019 | NA | 19 | NA | 20 | 16 |
Farm 3 | 2019 | NA | 2 | NA | NA | 7 |
Farm 3 | 2018 | NA | NA | 5 | 27 | NA |
Farm 3 | 2018 | NA | 7 | 43 | 49 | 42 |
如果我在这里使用 aggregate(.~Farms + Year, Data, mean),我会在 aggregate.data.frame(lhs, mf[- 1L], FUN = FUN, ...) :没有要聚合的行 我认为这是因为 mean 函数无法解释 NA.
有谁知道我们如何修改 aggregate/group_by 函数以通过仅使用没有 NA 数据的年份计算平均值来说明 NA? IE。 2020: 10, 2019: NA, 2018:20, 2017:NA, 2016:15 -> 平均值(扣除 NA 年 2019 和 2017 后)将为 (10 + 20 + 15) / (3 ) = 15.
理想的输出如下:
Farm | Year | Cow | Duck | Chicken | Sheep | Horse |
---|---|---|---|---|---|---|
Farm 1 | 2020 | 22 (avg = 22/1 as one entry is NA) | 12 | 110 | 25 | 22.5 |
Farm 1 | 2019 | 14 | 6 | 65 | 10 | 13.5 |
Farm 1 | 2018 | 8 | N.A. (as it's all NA) | 10 | 14.5 | 12 |
Farm 2 | 2020 | 31 | 20.5 | 29 | 15 | 14 |
Farm 2 | 2019 | 11.5 | 40.5 | 43 | 18.5 | 42.5 |
Farm 2 | 2018 | 36.5 | 26.5 | 28 | 30 | 11 |
Farm 3 | 2020 | ... | ... | ... | ... | ... |
Farm 3 | 2019 | ... | ... | ... | ... | ... |
Farm 3 | 2018 | ... | ... | ... | ... | ... |
这里是创建想要的 data.frame 的方法。我认为您的解决方案在第 2 行(绵羊)中有一个错误,其中 mean(NA, 10) 等于 5 而不是 10。
library(dplyr)
使用聚合
Data %>%
aggregate(.~Year+Farms,., FUN=mean, na.rm=T, na.action=NULL) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
使用总结
Data %>%
group_by(Year, Farms) %>%
summarize(MeanCow = mean(Cow, na.rm=T),
MeanDuck = mean(Duck, na.rm=T),
MeanChicken = mean(Chicken, na.rm=T),
MeanSheep = mean(Sheep, na.rm=T),
MeanHorse = mean(Horse, na.rm=T)) %>%
arrange(Farms, desc(Year)) %>%
as.data.frame() %>%
mutate_at(names(.), ~replace(., is.nan(.), NA))
两者的解决方案
Year Farms Cow Duck Chicken Sheep Horse
1 2020 Farm 1 22.0 12.0 110 25.0 22.5
2 2019 Farm 1 14.0 6.0 65 10.0 13.5
3 2018 Farm 1 8.0 NA 10 14.5 12.0
4 2020 Farm 2 31.0 20.5 29 15.0 14.0
5 2019 Farm 2 11.5 40.5 43 18.5 42.5
6 2018 Farm 2 36.5 26.5 28 30.0 11.0
7 2020 Farm 3 38.5 9.0 37 30.0 42.0
8 2019 Farm 3 NA 10.5 NA 20.0 11.5
9 2018 Farm 3 NA 7.0 24 38.0 42.0