分组观察比例
Proportion of observation by group
我有一个包含两列的数据集,物种和颜色:
Species Color
daisy white
daisy yellow
iris purple
iris purple
iris purple
tulip red
tulip red
…etc
使用 dplyr(count) 我总结了每个物种的颜色观察数量:
data %>%
count(Species, Color)
Species Color n
daisy white 1
daisy yellow 1
iris purple 3
tulip red 2
tulip yellow 4
tulip pink 2
我想添加一个列,按物种显示每种颜色的比例(n per color/total n per species):
Species Color n proportion
daisy white 1 0.5
daisy yellow 1 0.5
iris purple 3 1
tulip red 2 0.25
tulip yellow 4 0.5
tulip pink 2 0.25
您可以使用以下代码:
library(dplyr)
data %>%
group_by(Species, Color) %>%
summarise(n = n()) %>%
mutate(proportion = n / sum(n))
输出:
# A tibble: 4 × 4
# Groups: Species [3]
Species Color n proportion
<chr> <chr> <int> <dbl>
1 daisy white 1 0.5
2 daisy yellow 1 0.5
3 iris purple 3 1
4 tulip red 2 1
数据
data <- data.frame(Species = c("daisy", "daisy", "iris", "iris", "iris", "tulip", "tulip"),
Color = c("white", "yellow", "purple", "purple", "purple", "red", "red"))
我有一个包含两列的数据集,物种和颜色:
Species Color
daisy white
daisy yellow
iris purple
iris purple
iris purple
tulip red
tulip red
…etc
使用 dplyr(count) 我总结了每个物种的颜色观察数量:
data %>%
count(Species, Color)
Species Color n
daisy white 1
daisy yellow 1
iris purple 3
tulip red 2
tulip yellow 4
tulip pink 2
我想添加一个列,按物种显示每种颜色的比例(n per color/total n per species):
Species Color n proportion
daisy white 1 0.5
daisy yellow 1 0.5
iris purple 3 1
tulip red 2 0.25
tulip yellow 4 0.5
tulip pink 2 0.25
您可以使用以下代码:
library(dplyr)
data %>%
group_by(Species, Color) %>%
summarise(n = n()) %>%
mutate(proportion = n / sum(n))
输出:
# A tibble: 4 × 4
# Groups: Species [3]
Species Color n proportion
<chr> <chr> <int> <dbl>
1 daisy white 1 0.5
2 daisy yellow 1 0.5
3 iris purple 3 1
4 tulip red 2 1
数据
data <- data.frame(Species = c("daisy", "daisy", "iris", "iris", "iris", "tulip", "tulip"),
Color = c("white", "yellow", "purple", "purple", "purple", "red", "red"))