分组观察比例

Proportion of observation by group

我有一个包含两列的数据集,物种和颜色:

Species Color
daisy   white
daisy   yellow
iris    purple
iris    purple
iris    purple
tulip   red
tulip   red
…etc

使用 dplyr(count) 我总结了每个物种的颜色观察数量:

data %>%                               
  count(Species, Color)


Species Color   n
daisy   white   1
daisy   yellow  1
iris    purple  3
tulip   red     2
tulip   yellow  4
tulip   pink    2

我想添加一个列,按物种显示每种颜色的比例(n per color/total n per species):

Species Color   n   proportion
daisy   white   1   0.5
daisy   yellow  1   0.5
iris    purple  3   1
tulip   red     2   0.25
tulip   yellow  4   0.5
tulip   pink    2   0.25

您可以使用以下代码:

library(dplyr)
data %>%
  group_by(Species, Color) %>%
  summarise(n = n()) %>%
  mutate(proportion = n / sum(n))

输出:

# A tibble: 4 × 4
# Groups:   Species [3]
  Species Color      n proportion
  <chr>   <chr>  <int>      <dbl>
1 daisy   white      1        0.5
2 daisy   yellow     1        0.5
3 iris    purple     3        1  
4 tulip   red        2        1 

数据

data <- data.frame(Species = c("daisy", "daisy", "iris", "iris", "iris", "tulip", "tulip"),
                   Color = c("white", "yellow", "purple", "purple", "purple", "red", "red"))