使用 dplyr 计算组平均值和逻辑值之间的差异
Computing difference between averages by group and logical values using dplyr
有谁知道使用 dplyr 计算 some_var == TRUE
和 some_var == FALSE
的平均值之间的差异的方法,按第三个变量分组?
例如,给定以下示例数据框:
library('dplyr')
dat <- iris %>%
mutate(wide=Sepal.Width > 3) %>%
group_by(Species, wide) %>%
summarize(mean_width=mean(Sepal.Width))
dat
# A tibble: 6 x 3
# Groups: Species [?]
Species wide mean_width
<fctr> <lgl> <dbl>
1 setosa FALSE 2.900000
2 setosa TRUE 3.528571
3 versicolor FALSE 2.688095
4 versicolor TRUE 3.200000
5 virginica FALSE 2.800000
6 virginica TRUE 3.311765
有谁知道根据物种推导具有 wide == TRUE
和 wide == FALSE
差异的新数据框的方法吗?
这可以使用几个语句来完成:
false_vals <- dat %>% filter(wide==FALSE)
true_vals <- dat %>% filter(wide==TRUE)
diff <- data.frame(Species=unique(dat$Species), diff=true_vals$mean_width - false_vals$mean_width)
> diff
Species diff
1 setosa 0.6285714
2 versicolor 0.5119048
3 virginica 0.5117647
但是,这似乎应该可以直接使用 dplyr 实现。
有什么想法吗?
使用 tidyr
包中的 spread()
:
library(tidyr)
iris %>% mutate(wide=Sepal.Width > 3) %>%
group_by(Species, wide) %>%
summarize(mean_width=mean(Sepal.Width)) %>%
spread(wide, mean_width) %>%
summarise(diff = `TRUE` - `FALSE`)
# Species diff
#1 setosa 0.6285714
#2 versicolor 0.5119048
#3 virginica 0.5117647
对于新版本的 Tidyr 包 (>1.0.0),现在最好使用 pivot_wider 命令而不是 spread.它更直观,未来可能会弃用 spread 命令。
library(tidyr)
iris %>% mutate(wide=Sepal.Width > 3) %>%
group_by(Species, wide) %>%
summarize(mean_width=mean(Sepal.Width)) %>%
pivot_wider(names_from = wide, values_from = mean_width) %>%
summarise(diff = `TRUE` - `FALSE`)
# Species diff
#1 setosa 0.6285714
#2 versicolor 0.5119048
#3 virginica 0.5117647
有谁知道使用 dplyr 计算 some_var == TRUE
和 some_var == FALSE
的平均值之间的差异的方法,按第三个变量分组?
例如,给定以下示例数据框:
library('dplyr')
dat <- iris %>%
mutate(wide=Sepal.Width > 3) %>%
group_by(Species, wide) %>%
summarize(mean_width=mean(Sepal.Width))
dat
# A tibble: 6 x 3
# Groups: Species [?]
Species wide mean_width
<fctr> <lgl> <dbl>
1 setosa FALSE 2.900000
2 setosa TRUE 3.528571
3 versicolor FALSE 2.688095
4 versicolor TRUE 3.200000
5 virginica FALSE 2.800000
6 virginica TRUE 3.311765
有谁知道根据物种推导具有 wide == TRUE
和 wide == FALSE
差异的新数据框的方法吗?
这可以使用几个语句来完成:
false_vals <- dat %>% filter(wide==FALSE)
true_vals <- dat %>% filter(wide==TRUE)
diff <- data.frame(Species=unique(dat$Species), diff=true_vals$mean_width - false_vals$mean_width)
> diff
Species diff
1 setosa 0.6285714
2 versicolor 0.5119048
3 virginica 0.5117647
但是,这似乎应该可以直接使用 dplyr 实现。
有什么想法吗?
使用 tidyr
包中的 spread()
:
library(tidyr)
iris %>% mutate(wide=Sepal.Width > 3) %>%
group_by(Species, wide) %>%
summarize(mean_width=mean(Sepal.Width)) %>%
spread(wide, mean_width) %>%
summarise(diff = `TRUE` - `FALSE`)
# Species diff
#1 setosa 0.6285714
#2 versicolor 0.5119048
#3 virginica 0.5117647
对于新版本的 Tidyr 包 (>1.0.0),现在最好使用 pivot_wider 命令而不是 spread.它更直观,未来可能会弃用 spread 命令。
library(tidyr)
iris %>% mutate(wide=Sepal.Width > 3) %>%
group_by(Species, wide) %>%
summarize(mean_width=mean(Sepal.Width)) %>%
pivot_wider(names_from = wide, values_from = mean_width) %>%
summarise(diff = `TRUE` - `FALSE`)
# Species diff
#1 setosa 0.6285714
#2 versicolor 0.5119048
#3 virginica 0.5117647