在 R 中查找汇总列的相对频率

Find relative frequencies of summarized columns in R

我需要获取 R 中汇总列的相对频率。我使用 dplyr 的汇总来查找每个分组行的总数,如下所示:

data %>%
  group_by(x) %>%
  summarise(total = sum(dollars))

     x                    total 
   <chr>                 <dbl>
 1 expense 1              3600 
 2 expense 2              2150 
 3 expense 3              2000 

但现在我需要为每个总行的相对频率创建一个新列以获得此结果:

     x                   total     p
   <chr>                 <dbl>   <dbl>
 1 expense 1              3600   46.45%
 2 expense 2              2150   27.74%
 3 expense 3              2000   25.81%

我试过这个:

data %>%
  group_by(x) %>%
  summarise(total = sum(dollars), p = scales::percent(total/sum(total))

还有这个:

data %>%
  group_by(x) %>%
  summarise(total = sum(dollars), p = total/sum(total)*100)

但结果总是这样:

     x                   total     p
   <chr>                 <dbl>   <dbl>
 1 expense 1              3600    100%
 2 expense 2              2150    100%
 3 expense 3              2000    100%

问题似乎是可能影响结果的汇总总计列。有什么想法可以帮助我吗?谢谢

你得到 100%,因为它计算了该特定组的总数。你需要取消分组。假设您想要除以 总条目数 只需除以 nrow(df).

data %>%
  group_by(x) %>%
  summarise(total = sum(dollars), p = total/nrow(data)*100)

你得到 100% 因为分组。但是,在您总结之后,dplyr 将放弃一级分组。意思是如果你例如做 mutate() 之后,你会得到你需要的结果:

library(dplyr)

data <- tibble(
  x = c("expense 1", "expense 2", "expense 3"),
  dollars = c(3600L, 2150L, 2000L)
)


data %>%
  group_by(x) %>%
  summarise(total = sum(dollars)) %>% 
  mutate(p = total/sum(total)*100)


# A tibble: 3 x 3
  x         total     p
  <chr>     <int> <dbl>
1 expense 1  3600  46.5
2 expense 2  2150  27.7
3 expense 3  2000  25.8

在第一个 sum 之后,取消组合并创建 pmutate

iris %>%
  group_by(Species) %>%
  summarise(total = sum(Sepal.Length)) %>%
  ungroup() %>%
  mutate(p = total/sum(total)*100)
## A tibble: 3 x 3
#  Species    total     p
#  <fct>      <dbl> <dbl>
#1 setosa      250.  28.6
#2 versicolor  297.  33.9
#3 virginica   329.  37.6