R dplyr 使用 group_by 总结均值和标准差
R dplyr summarise mean and stdev using group_by
我有一个如下所示的数据框:
df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
"Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
"Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
"Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))
我正在尝试为 Experiment
和 Type
的组合计算 Frequency
的均值和标准差,我首先通过 运行 宁此行进行了尝试:
df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)
如果我 运行 这样做,我会得到如下所示的提示:
Experiment Type mean sd
Exp1 alpha 10 5
Exp1 beta 102. 3.54
Epx1 gamma 1000 NA
但我希望 R 认为所有 Type
(alpha
、beta
、gamma
)对于 Experiment
的每个组合都应该存在和 Replicate
,这样如果 Type
没有 Frequency
值,R 将使用 0
而不是不包括该值。
换句话说,我想要的需要像下面这样计算:
Experiment Type mean sd
Exp1 alpha mean(10,15,5) sd(10,15,5)
Exp1 beta mean(100,0,105) sd(100,0,105)
Exp1 gamma mean(1000,0,0) sd(1000,0,0)
例如,对于Exp1
beta
,我上面使用的summarise
函数计算mean(100,105)
和sd(100,105)
,因为Exp1
Replicate B
在我的 df
中不存在。但我希望 R 改为计算 mean(100,0,105)
和 sd(100,0,105)
。谁能给我一些关于如何做到这一点的想法?
您需要先 complete
您的数据框用 0 填充缺失的数据,然后将“已完成”的数据框通过管道传输到您的函数。
library(tidyverse)
df %>%
complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>%
group_by(Experiment, Type) %>%
summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")
# A tibble: 9 × 4
Experiment Type mean sd
<chr> <chr> <dbl> <dbl>
1 Exp1 alpha 10 5
2 Exp1 beta 68.3 59.2
3 Exp1 gamma 333. 577.
4 Exp2 alpha 3.33 5.77
5 Exp2 beta 66.7 58.0
6 Exp2 gamma 677. 586.
7 Exp3 alpha 8.33 7.64
8 Exp3 beta 33.3 57.7
9 Exp3 gamma 330 572.
您需要在 group_by
函数中包含 Replicate
并将输出转换为更宽的小标题。可以通过替换 NA 值来改变数字列。然后,连接 mean 和 sd 列将给出所需的输出。
df %>% group_by(Experiment, Type, Replicate) %>%
summarise(mean = mean(Frequency), sd = sd(Frequency)) %>%
pivot_wider(names_from = Replicate, values_from = c(mean, sd)) %>%
mutate(across(where(is.double),~ replace_na(.,0))) %>%
mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>%
select(Experiment, Type, mean, sd)
输出为
# A tibble: 9 x 4
# Groups: Experiment, Type [9]
Experiment Type mean sd
<chr> <chr> <chr> <chr>
1 Exp1 alpha mean(10,15,5) sd(0,0,0)
2 Exp1 beta mean(100,0,105) sd(0,0,0)
3 Exp1 gamma mean(1000,0,0) sd(0,0,0)
4 Exp2 alpha mean(10,0,0) sd(0,0,0)
5 Exp2 beta mean(0,95,105) sd(0,0,0)
6 Exp2 gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3 alpha mean(15,10,0) sd(0,0,0)
8 Exp3 beta mean(0,0,100) sd(0,0,0)
9 Exp3 gamma mean(0,990,0) sd(0,0,0)
我有一个如下所示的数据框:
df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
"Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
"Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
"Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))
我正在尝试为 Experiment
和 Type
的组合计算 Frequency
的均值和标准差,我首先通过 运行 宁此行进行了尝试:
df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)
如果我 运行 这样做,我会得到如下所示的提示:
Experiment Type mean sd
Exp1 alpha 10 5
Exp1 beta 102. 3.54
Epx1 gamma 1000 NA
但我希望 R 认为所有 Type
(alpha
、beta
、gamma
)对于 Experiment
的每个组合都应该存在和 Replicate
,这样如果 Type
没有 Frequency
值,R 将使用 0
而不是不包括该值。
换句话说,我想要的需要像下面这样计算:
Experiment Type mean sd
Exp1 alpha mean(10,15,5) sd(10,15,5)
Exp1 beta mean(100,0,105) sd(100,0,105)
Exp1 gamma mean(1000,0,0) sd(1000,0,0)
例如,对于Exp1
beta
,我上面使用的summarise
函数计算mean(100,105)
和sd(100,105)
,因为Exp1
Replicate B
在我的 df
中不存在。但我希望 R 改为计算 mean(100,0,105)
和 sd(100,0,105)
。谁能给我一些关于如何做到这一点的想法?
您需要先 complete
您的数据框用 0 填充缺失的数据,然后将“已完成”的数据框通过管道传输到您的函数。
library(tidyverse)
df %>%
complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>%
group_by(Experiment, Type) %>%
summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")
# A tibble: 9 × 4
Experiment Type mean sd
<chr> <chr> <dbl> <dbl>
1 Exp1 alpha 10 5
2 Exp1 beta 68.3 59.2
3 Exp1 gamma 333. 577.
4 Exp2 alpha 3.33 5.77
5 Exp2 beta 66.7 58.0
6 Exp2 gamma 677. 586.
7 Exp3 alpha 8.33 7.64
8 Exp3 beta 33.3 57.7
9 Exp3 gamma 330 572.
您需要在 group_by
函数中包含 Replicate
并将输出转换为更宽的小标题。可以通过替换 NA 值来改变数字列。然后,连接 mean 和 sd 列将给出所需的输出。
df %>% group_by(Experiment, Type, Replicate) %>%
summarise(mean = mean(Frequency), sd = sd(Frequency)) %>%
pivot_wider(names_from = Replicate, values_from = c(mean, sd)) %>%
mutate(across(where(is.double),~ replace_na(.,0))) %>%
mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>%
select(Experiment, Type, mean, sd)
输出为
# A tibble: 9 x 4
# Groups: Experiment, Type [9]
Experiment Type mean sd
<chr> <chr> <chr> <chr>
1 Exp1 alpha mean(10,15,5) sd(0,0,0)
2 Exp1 beta mean(100,0,105) sd(0,0,0)
3 Exp1 gamma mean(1000,0,0) sd(0,0,0)
4 Exp2 alpha mean(10,0,0) sd(0,0,0)
5 Exp2 beta mean(0,95,105) sd(0,0,0)
6 Exp2 gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3 alpha mean(15,10,0) sd(0,0,0)
8 Exp3 beta mean(0,0,100) sd(0,0,0)
9 Exp3 gamma mean(0,990,0) sd(0,0,0)