用 R 中的均值、中值、范围和量化进行总结
summarise with mean, median, range and quants in R
我目前正在使用 R 中的 palmer penguins 数据集,想要汇总按性别分组的均值、中位数、范围和数量的数据。
我当前的解决方案是从汇总数据中分离出量化数据。有没有办法一次性做到这一点。如果不是,我该如何组合数据集。 quant组目前是long格式,不知道怎么组合。
group_summary <- penguins %>% group_by(sex) %>% summarize(mean = mean(bill_length_mm,
na.rm = TRUE), meadian = median(bill_length_mm, na.rm = TRUE), range =
max(bill_length_mm, na.rm = TRUE) - min(bill_length_mm, na.rm = TRUE))
group_quant <- penguins %>% group_by(sex) %>% summarize(quantile(bill_length_mm,
probs =seq(.1, 1, by = .1), na.rm =TRUE, .groups = 'drop'))
我有以下解决方案,但它删除了 Sex 的 NA 值,我不确定为什么。
group_summary <- do.call(data.frame,aggregate(bill_length_mm ~ sex, penguins,
function(x) c(mean = mean(x, na.rm = TRUE), median = median(x, na.rm = TRUE), range =
max(x, na.rm = TRUE) - min(x, na.rm = TRUE), quantile(x, probs = seq(.1, 1, by = .1),
na.rm = TRUE, .groups = 'drop'))))
您可以将分位数保存在列表中,然后使用 unnest_wider
从中创建新列。为了计算 range
,我使用 diff(range(...))
而不是 max(...) - min(...)
。两者都很好,但我将其包括在内以显示替代方案。
library(palmerpenguins)
library(dplyr)
library(tidyr)
penguins %>%
group_by(sex) %>%
summarize(mean = mean(bill_length_mm, na.rm = TRUE),
median = median(bill_length_mm, na.rm = TRUE),
range = diff(range(bill_length_mm, na.rm = TRUE)),
quantile = list(quantile(bill_length_mm, probs = seq(.1, 1, by = .1), na.rm = TRUE))) %>%
unnest_wider(quantile)
# sex mean median range `10%` `20%` `30%` `40%` `50%` `60%` `70%` `80%` `90%` `100%`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 female 42.1 42.8 25.9 35.8 36.7 38.2 40 42.8 45.1 45.7 46.5 47.5 58
#2 male 45.9 46.8 25 38.8 40.5 41.3 43.2 46.8 49.0 50.0 50.8 51.9 59.6
#3 NA 41.3 42 13.2 36.8 37.7 37.8 38.6 42 44 44.5 45.2 46.4 47.3
我目前正在使用 R 中的 palmer penguins 数据集,想要汇总按性别分组的均值、中位数、范围和数量的数据。
我当前的解决方案是从汇总数据中分离出量化数据。有没有办法一次性做到这一点。如果不是,我该如何组合数据集。 quant组目前是long格式,不知道怎么组合。
group_summary <- penguins %>% group_by(sex) %>% summarize(mean = mean(bill_length_mm,
na.rm = TRUE), meadian = median(bill_length_mm, na.rm = TRUE), range =
max(bill_length_mm, na.rm = TRUE) - min(bill_length_mm, na.rm = TRUE))
group_quant <- penguins %>% group_by(sex) %>% summarize(quantile(bill_length_mm,
probs =seq(.1, 1, by = .1), na.rm =TRUE, .groups = 'drop'))
我有以下解决方案,但它删除了 Sex 的 NA 值,我不确定为什么。
group_summary <- do.call(data.frame,aggregate(bill_length_mm ~ sex, penguins,
function(x) c(mean = mean(x, na.rm = TRUE), median = median(x, na.rm = TRUE), range =
max(x, na.rm = TRUE) - min(x, na.rm = TRUE), quantile(x, probs = seq(.1, 1, by = .1),
na.rm = TRUE, .groups = 'drop'))))
您可以将分位数保存在列表中,然后使用 unnest_wider
从中创建新列。为了计算 range
,我使用 diff(range(...))
而不是 max(...) - min(...)
。两者都很好,但我将其包括在内以显示替代方案。
library(palmerpenguins)
library(dplyr)
library(tidyr)
penguins %>%
group_by(sex) %>%
summarize(mean = mean(bill_length_mm, na.rm = TRUE),
median = median(bill_length_mm, na.rm = TRUE),
range = diff(range(bill_length_mm, na.rm = TRUE)),
quantile = list(quantile(bill_length_mm, probs = seq(.1, 1, by = .1), na.rm = TRUE))) %>%
unnest_wider(quantile)
# sex mean median range `10%` `20%` `30%` `40%` `50%` `60%` `70%` `80%` `90%` `100%`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 female 42.1 42.8 25.9 35.8 36.7 38.2 40 42.8 45.1 45.7 46.5 47.5 58
#2 male 45.9 46.8 25 38.8 40.5 41.3 43.2 46.8 49.0 50.0 50.8 51.9 59.6
#3 NA 41.3 42 13.2 36.8 37.7 37.8 38.6 42 44 44.5 45.2 46.4 47.3