我怎样才能同时获得数据框的均值和众数摘要?
How could I get the mean and mode summary at the same time for a dataframe?
我有一个包含 10 个数字列和 3 个字符列的数据框,作为示例,我准备了这个数据框:
df <- data.frame(
name = c("ANCON","ANCON","ANCON", "LUNA", "MAGOLLO", "MANCHAY", "MANCHAY","PATILLA","PATILLA"),
destiny = c("sea","reuse","sea","sea", "reuse","sea","sea","sea","sea"),
year = c("2022","2015","2022","2022", "2015","2016","2016","2018","2018"),
QQ = c(10,11,3,4,13,11,12,23,7),
Temp = c(14,16,16,15,16,20,19,14,18))
我需要按“姓名”列对其进行分组,获取“QQ”和“临时”列的平均摘要,以及“命运”和“年份”列的众数。我可以获得平均摘要,但我无法包含模式
df_mean <- df %>%
group_by(name) %>%
summarise_all(mean, na.rm = TRUE)
name destiny year QQ Temp
<chr> <dbl> <dbl> <dbl> <dbl>
1 ANCON NA NA 8 15.3
2 LUNA NA NA 4 15
3 MAGOLLO NA NA 13 16
4 MANCHAY NA NA 11.5 19.5
5 PATILLA NA NA 15 16
期望的中位数输出是这样的:
name destiny year QQ Temp
1 ANCON sea 2022 8.0 15.3
2 LUNA sea 2022 4.0 15.0
3 MAGOLLO reuse 2015 13.0 16.0
4 MANCHAY sea 2016 11.5 19.5
5 PATILLA sea 2018 15.0 16.0
我该怎么做?请帮忙
使用 across
和 cur_column
。不过,中位数仅适用于序数数据,对于分类数据(如您拥有的字符列),请使用模式:
mode <- function(x) {
x_unique <- unique(x)
x_unique[which.max(tabulate(match(x, x_unique)))]
}
然后
mode_columns <- c('destiny', 'year')
df %>%
group_by(name) %>%
summarise(
across(
everything(),
~ if (cur_column() %in% mode_columns) mode(.x) else mean(.x)
)
)
# A tibble: 5 × 5
name destiny year QQ Temp
<chr> <chr> <chr> <dbl> <dbl>
1 ANCON sea 2022 8 15.3
2 LUNA sea 2022 4 15
3 MAGOLLO reuse 2015 13 16
4 MANCHAY sea 2016 11.5 19.5
5 PATILLA sea 2018 15 16
UPD:或者您可以进行一些不同的总结
summarise(
across({{mode_cols}}, mode),
across(!{{mode_cols}}, mean)
)
我有一个包含 10 个数字列和 3 个字符列的数据框,作为示例,我准备了这个数据框:
df <- data.frame(
name = c("ANCON","ANCON","ANCON", "LUNA", "MAGOLLO", "MANCHAY", "MANCHAY","PATILLA","PATILLA"),
destiny = c("sea","reuse","sea","sea", "reuse","sea","sea","sea","sea"),
year = c("2022","2015","2022","2022", "2015","2016","2016","2018","2018"),
QQ = c(10,11,3,4,13,11,12,23,7),
Temp = c(14,16,16,15,16,20,19,14,18))
我需要按“姓名”列对其进行分组,获取“QQ”和“临时”列的平均摘要,以及“命运”和“年份”列的众数。我可以获得平均摘要,但我无法包含模式
df_mean <- df %>%
group_by(name) %>%
summarise_all(mean, na.rm = TRUE)
name destiny year QQ Temp
<chr> <dbl> <dbl> <dbl> <dbl>
1 ANCON NA NA 8 15.3
2 LUNA NA NA 4 15
3 MAGOLLO NA NA 13 16
4 MANCHAY NA NA 11.5 19.5
5 PATILLA NA NA 15 16
期望的中位数输出是这样的:
name destiny year QQ Temp
1 ANCON sea 2022 8.0 15.3
2 LUNA sea 2022 4.0 15.0
3 MAGOLLO reuse 2015 13.0 16.0
4 MANCHAY sea 2016 11.5 19.5
5 PATILLA sea 2018 15.0 16.0
我该怎么做?请帮忙
使用 across
和 cur_column
。不过,中位数仅适用于序数数据,对于分类数据(如您拥有的字符列),请使用模式:
mode <- function(x) {
x_unique <- unique(x)
x_unique[which.max(tabulate(match(x, x_unique)))]
}
然后
mode_columns <- c('destiny', 'year')
df %>%
group_by(name) %>%
summarise(
across(
everything(),
~ if (cur_column() %in% mode_columns) mode(.x) else mean(.x)
)
)
# A tibble: 5 × 5
name destiny year QQ Temp
<chr> <chr> <chr> <dbl> <dbl>
1 ANCON sea 2022 8 15.3
2 LUNA sea 2022 4 15
3 MAGOLLO reuse 2015 13 16
4 MANCHAY sea 2016 11.5 19.5
5 PATILLA sea 2018 15 16
UPD:或者您可以进行一些不同的总结
summarise(
across({{mode_cols}}, mode),
across(!{{mode_cols}}, mean)
)