在考虑 NA 的情况下按 2 个条件的平均值汇总
Aggregating by average on 2 conditions while accounting for NA
我有以下table
Data = structure(list(Countries = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("China", "India", "Vietnam"), class = "factor"), Year = c(2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2018L, 2018L, 2017L, 2018L, 2018L, 2018L,2017L, 2017L, 2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2019L, 2018L, 2017L, 2017L), Food = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bread","Rice"), class = "factor"), Price = c(2.8, 2.8, 2.7, NA, 2.6, 2.58, 2.53, 2.5, NA, NA, 2.395, 2.9, 2.8, 2.75, 2.66, 2.5, 11.5,11.3, 11.2, 11, NA, 10.7, 10.7, NA, NA, 10.3, 10.1, 10)), class = "data.frame", row.names = c(NA, -28L))
table 显示如下:
Countries
Year
Food
Price
China
2019
Bread
2.8
China
2018
Bread
2.8
China
2018
Bread
2.7
China
2018
Bread
NA
China
2017
Bread
2.6
China
2017
Bread
2.58
India
2019
Bread
2.53
India
2019
Bread
2.5
India
2018
Bread
NA
India
2018
Bread
NA
India
2017
Bread
2.395
Vietnam
2018
Bread
2.9
Vietnam
2018
Bread
2.8
Vietnam
2018
Bread
2.75
Vietnam
2017
Bread
2.66
Vietnam
2017
Bread
2.5
China
2019
Rice
11.5
China
2018
Rice
11.3
China
2018
Rice
11.2
China
2018
Rice
11.0
China
2017
Rice
NA
China
2017
Rice
10.7
Vietnam
2019
Rice
10.7
Vietnam
2019
Rice
NA
Vietnam
2019
Rice
NA
Vietnam
2018
Rice
10.3
Vietnam
2017
Rice
10.1
Vietnam
2017
Rice
10.0
有谁知道如何使用 dplyr and/or 根据国家和年份汇总单个食品的价格(实际数据集有更多国家、年份和食品,但格式相同) tiderverse 同时考虑 NA?即
- 如果 2018 年的面包价格为 2.8、2.7 和 NA,则平均值将为 (2.8 + 2.7)/2 而不是 (2.8 + 2.7 + 0)/3
- 如果全年的面包价格为 NA,我们可以丢弃它,甚至不必在输出上打印它 table。
输出table
Countries
Year
Food
Price
China
2019
Bread
2.8
China
2018
Bread
2.8
China
2017
Bread
2.6
India
2019
Bread
2.5
India
2017
Bread
2.4
Vietnam
2018
Bread
2.8
Vietnam
2017
Bread
2.6
China
2019
Rice
11.5
China
2018
Rice
11.2
China
2017
Rice
10.7
Vietnam
2019
Rice
10.7
Vietnam
2018
Rice
10.3
Vietnam
2017
Rice
10.1
也是出于真正的好奇,这甚至可以在 base R 中完成吗?
与dplyr
:
Data %>%
group_by(Countries, Year, Food) %>%
summarise(Price = mean(Price, na.rm = TRUE), .groups = 'drop') %>%
filter(!is.na(Price)) %>%
arrange(Food, Countries, desc(Year))
#> # A tibble: 13 × 4
#> Countries Year Food Price
#> <fct> <int> <fct> <dbl>
#> 1 China 2019 Bread 2.8
#> 2 China 2018 Bread 2.75
#> 3 China 2017 Bread 2.59
#> 4 India 2019 Bread 2.51
#> 5 India 2017 Bread 2.40
#> 6 Vietnam 2018 Bread 2.82
#> 7 Vietnam 2017 Bread 2.58
#> 8 China 2019 Rice 11.5
#> 9 China 2018 Rice 11.2
#> 10 China 2017 Rice 10.7
#> 11 Vietnam 2019 Rice 10.7
#> 12 Vietnam 2018 Rice 10.3
#> 13 Vietnam 2017 Rice 10.0
试试这个:
library(tidyverse)
Data <- structure(list(Countries = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("China", "India", "Vietnam"), class = "factor"), Year = c(2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2018L, 2018L, 2017L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2019L, 2018L, 2017L, 2017L), Food = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bread", "Rice"), class = "factor"), Price = c(2.8, 2.8, 2.7, NA, 2.6, 2.58, 2.53, 2.5, NA, NA, 2.395, 2.9, 2.8, 2.75, 2.66, 2.5, 11.5, 11.3, 11.2, 11, NA, 10.7, 10.7, NA, NA, 10.3, 10.1, 10)), class = "data.frame", row.names = c(NA, -28L))
Data |>
group_by(Countries, Year, Food) |>
summarise(Price = round(mean(Price, na.rm = TRUE), 1)) |>
arrange(Food, Countries, desc(Year)) |>
filter(!is.nan(Price))
#> # A tibble: 13 × 4
#> # Groups: Countries, Year [8]
#> Countries Year Food Price
#> <fct> <int> <fct> <dbl>
#> 1 China 2019 Bread 2.8
#> 2 China 2018 Bread 2.8
#> 3 China 2017 Bread 2.6
#> 4 India 2019 Bread 2.5
#> 5 India 2017 Bread 2.4
#> 6 Vietnam 2018 Bread 2.8
#> 7 Vietnam 2017 Bread 2.6
#> 8 China 2019 Rice 11.5
#> 9 China 2018 Rice 11.2
#> 10 China 2017 Rice 10.7
#> 11 Vietnam 2019 Rice 10.7
#> 12 Vietnam 2018 Rice 10.3
#> 13 Vietnam 2017 Rice 10.1
由 reprex package (v2.0.1)
创建于 2022-05-11
与 data.table
:
library(data.table)
setDT(Data)[!is.na(Price), .(Price=mean(Price,na.rm=T)), by=.(Countries,Year,Food)]
输出:
Countries Year Food Price
1: China 2019 Bread 2.800000
2: China 2018 Bread 2.750000
3: China 2017 Bread 2.590000
4: India 2019 Bread 2.515000
5: India 2017 Bread 2.395000
6: Vietnam 2018 Bread 2.816667
7: Vietnam 2017 Bread 2.580000
8: China 2019 Rice 11.500000
9: China 2018 Rice 11.166667
10: China 2017 Rice 10.700000
11: Vietnam 2019 Rice 10.700000
12: Vietnam 2018 Rice 10.300000
13: Vietnam 2017 Rice 10.050000
对于基本 R 答案,aggregate
将自动删除缺失值(如果需要,您可以使用 na.action
参数更改此行为)。所以:
aggregate( Price ~ Food + Year + Countries , mean , data=Data)
给你:
Food Year Countries Price
1 Bread 2017 China 2.590000
2 Rice 2017 China 10.700000
3 Bread 2018 China 2.750000
4 Rice 2018 China 11.166667
5 Bread 2019 China 2.800000
6 Rice 2019 China 11.500000
7 Bread 2017 India 2.395000
8 Bread 2019 India 2.515000
9 Bread 2017 Vietnam 2.580000
10 Rice 2017 Vietnam 10.050000
11 Bread 2018 Vietnam 2.816667
12 Rice 2018 Vietnam 10.300000
13 Rice 2019 Vietnam 10.700000
如果您希望它们以不同的顺序排列,只需重新排列公式的 RHS。
我有以下table
Data = structure(list(Countries = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("China", "India", "Vietnam"), class = "factor"), Year = c(2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2018L, 2018L, 2017L, 2018L, 2018L, 2018L,2017L, 2017L, 2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2019L, 2018L, 2017L, 2017L), Food = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bread","Rice"), class = "factor"), Price = c(2.8, 2.8, 2.7, NA, 2.6, 2.58, 2.53, 2.5, NA, NA, 2.395, 2.9, 2.8, 2.75, 2.66, 2.5, 11.5,11.3, 11.2, 11, NA, 10.7, 10.7, NA, NA, 10.3, 10.1, 10)), class = "data.frame", row.names = c(NA, -28L))
table 显示如下:
Countries | Year | Food | Price |
---|---|---|---|
China | 2019 | Bread | 2.8 |
China | 2018 | Bread | 2.8 |
China | 2018 | Bread | 2.7 |
China | 2018 | Bread | NA |
China | 2017 | Bread | 2.6 |
China | 2017 | Bread | 2.58 |
India | 2019 | Bread | 2.53 |
India | 2019 | Bread | 2.5 |
India | 2018 | Bread | NA |
India | 2018 | Bread | NA |
India | 2017 | Bread | 2.395 |
Vietnam | 2018 | Bread | 2.9 |
Vietnam | 2018 | Bread | 2.8 |
Vietnam | 2018 | Bread | 2.75 |
Vietnam | 2017 | Bread | 2.66 |
Vietnam | 2017 | Bread | 2.5 |
China | 2019 | Rice | 11.5 |
China | 2018 | Rice | 11.3 |
China | 2018 | Rice | 11.2 |
China | 2018 | Rice | 11.0 |
China | 2017 | Rice | NA |
China | 2017 | Rice | 10.7 |
Vietnam | 2019 | Rice | 10.7 |
Vietnam | 2019 | Rice | NA |
Vietnam | 2019 | Rice | NA |
Vietnam | 2018 | Rice | 10.3 |
Vietnam | 2017 | Rice | 10.1 |
Vietnam | 2017 | Rice | 10.0 |
有谁知道如何使用 dplyr and/or 根据国家和年份汇总单个食品的价格(实际数据集有更多国家、年份和食品,但格式相同) tiderverse 同时考虑 NA?即
- 如果 2018 年的面包价格为 2.8、2.7 和 NA,则平均值将为 (2.8 + 2.7)/2 而不是 (2.8 + 2.7 + 0)/3
- 如果全年的面包价格为 NA,我们可以丢弃它,甚至不必在输出上打印它 table。
输出table
Countries | Year | Food | Price |
---|---|---|---|
China | 2019 | Bread | 2.8 |
China | 2018 | Bread | 2.8 |
China | 2017 | Bread | 2.6 |
India | 2019 | Bread | 2.5 |
India | 2017 | Bread | 2.4 |
Vietnam | 2018 | Bread | 2.8 |
Vietnam | 2017 | Bread | 2.6 |
China | 2019 | Rice | 11.5 |
China | 2018 | Rice | 11.2 |
China | 2017 | Rice | 10.7 |
Vietnam | 2019 | Rice | 10.7 |
Vietnam | 2018 | Rice | 10.3 |
Vietnam | 2017 | Rice | 10.1 |
也是出于真正的好奇,这甚至可以在 base R 中完成吗?
与dplyr
:
Data %>%
group_by(Countries, Year, Food) %>%
summarise(Price = mean(Price, na.rm = TRUE), .groups = 'drop') %>%
filter(!is.na(Price)) %>%
arrange(Food, Countries, desc(Year))
#> # A tibble: 13 × 4
#> Countries Year Food Price
#> <fct> <int> <fct> <dbl>
#> 1 China 2019 Bread 2.8
#> 2 China 2018 Bread 2.75
#> 3 China 2017 Bread 2.59
#> 4 India 2019 Bread 2.51
#> 5 India 2017 Bread 2.40
#> 6 Vietnam 2018 Bread 2.82
#> 7 Vietnam 2017 Bread 2.58
#> 8 China 2019 Rice 11.5
#> 9 China 2018 Rice 11.2
#> 10 China 2017 Rice 10.7
#> 11 Vietnam 2019 Rice 10.7
#> 12 Vietnam 2018 Rice 10.3
#> 13 Vietnam 2017 Rice 10.0
试试这个:
library(tidyverse)
Data <- structure(list(Countries = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("China", "India", "Vietnam"), class = "factor"), Year = c(2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2018L, 2018L, 2017L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2018L, 2018L, 2018L, 2017L, 2017L, 2019L, 2019L, 2019L, 2018L, 2017L, 2017L), Food = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bread", "Rice"), class = "factor"), Price = c(2.8, 2.8, 2.7, NA, 2.6, 2.58, 2.53, 2.5, NA, NA, 2.395, 2.9, 2.8, 2.75, 2.66, 2.5, 11.5, 11.3, 11.2, 11, NA, 10.7, 10.7, NA, NA, 10.3, 10.1, 10)), class = "data.frame", row.names = c(NA, -28L))
Data |>
group_by(Countries, Year, Food) |>
summarise(Price = round(mean(Price, na.rm = TRUE), 1)) |>
arrange(Food, Countries, desc(Year)) |>
filter(!is.nan(Price))
#> # A tibble: 13 × 4
#> # Groups: Countries, Year [8]
#> Countries Year Food Price
#> <fct> <int> <fct> <dbl>
#> 1 China 2019 Bread 2.8
#> 2 China 2018 Bread 2.8
#> 3 China 2017 Bread 2.6
#> 4 India 2019 Bread 2.5
#> 5 India 2017 Bread 2.4
#> 6 Vietnam 2018 Bread 2.8
#> 7 Vietnam 2017 Bread 2.6
#> 8 China 2019 Rice 11.5
#> 9 China 2018 Rice 11.2
#> 10 China 2017 Rice 10.7
#> 11 Vietnam 2019 Rice 10.7
#> 12 Vietnam 2018 Rice 10.3
#> 13 Vietnam 2017 Rice 10.1
由 reprex package (v2.0.1)
创建于 2022-05-11与 data.table
:
library(data.table)
setDT(Data)[!is.na(Price), .(Price=mean(Price,na.rm=T)), by=.(Countries,Year,Food)]
输出:
Countries Year Food Price
1: China 2019 Bread 2.800000
2: China 2018 Bread 2.750000
3: China 2017 Bread 2.590000
4: India 2019 Bread 2.515000
5: India 2017 Bread 2.395000
6: Vietnam 2018 Bread 2.816667
7: Vietnam 2017 Bread 2.580000
8: China 2019 Rice 11.500000
9: China 2018 Rice 11.166667
10: China 2017 Rice 10.700000
11: Vietnam 2019 Rice 10.700000
12: Vietnam 2018 Rice 10.300000
13: Vietnam 2017 Rice 10.050000
对于基本 R 答案,aggregate
将自动删除缺失值(如果需要,您可以使用 na.action
参数更改此行为)。所以:
aggregate( Price ~ Food + Year + Countries , mean , data=Data)
给你:
Food Year Countries Price
1 Bread 2017 China 2.590000
2 Rice 2017 China 10.700000
3 Bread 2018 China 2.750000
4 Rice 2018 China 11.166667
5 Bread 2019 China 2.800000
6 Rice 2019 China 11.500000
7 Bread 2017 India 2.395000
8 Bread 2019 India 2.515000
9 Bread 2017 Vietnam 2.580000
10 Rice 2017 Vietnam 10.050000
11 Bread 2018 Vietnam 2.816667
12 Rice 2018 Vietnam 10.300000
13 Rice 2019 Vietnam 10.700000
如果您希望它们以不同的顺序排列,只需重新排列公式的 RHS。