在 R 中查找汇总列的相对频率
Find relative frequencies of summarized columns in R
我需要获取 R 中汇总列的相对频率。我使用 dplyr 的汇总来查找每个分组行的总数,如下所示:
data %>%
group_by(x) %>%
summarise(total = sum(dollars))
x total
<chr> <dbl>
1 expense 1 3600
2 expense 2 2150
3 expense 3 2000
但现在我需要为每个总行的相对频率创建一个新列以获得此结果:
x total p
<chr> <dbl> <dbl>
1 expense 1 3600 46.45%
2 expense 2 2150 27.74%
3 expense 3 2000 25.81%
我试过这个:
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = scales::percent(total/sum(total))
还有这个:
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/sum(total)*100)
但结果总是这样:
x total p
<chr> <dbl> <dbl>
1 expense 1 3600 100%
2 expense 2 2150 100%
3 expense 3 2000 100%
问题似乎是可能影响结果的汇总总计列。有什么想法可以帮助我吗?谢谢
你得到 100%,因为它计算了该特定组的总数。你需要取消分组。假设您想要除以 总条目数 只需除以 nrow(df)
.
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/nrow(data)*100)
你得到 100% 因为分组。但是,在您总结之后,dplyr 将放弃一级分组。意思是如果你例如做 mutate()
之后,你会得到你需要的结果:
library(dplyr)
data <- tibble(
x = c("expense 1", "expense 2", "expense 3"),
dollars = c(3600L, 2150L, 2000L)
)
data %>%
group_by(x) %>%
summarise(total = sum(dollars)) %>%
mutate(p = total/sum(total)*100)
# A tibble: 3 x 3
x total p
<chr> <int> <dbl>
1 expense 1 3600 46.5
2 expense 2 2150 27.7
3 expense 3 2000 25.8
在第一个 sum
之后,取消组合并创建 p
和 mutate
。
iris %>%
group_by(Species) %>%
summarise(total = sum(Sepal.Length)) %>%
ungroup() %>%
mutate(p = total/sum(total)*100)
## A tibble: 3 x 3
# Species total p
# <fct> <dbl> <dbl>
#1 setosa 250. 28.6
#2 versicolor 297. 33.9
#3 virginica 329. 37.6
我需要获取 R 中汇总列的相对频率。我使用 dplyr 的汇总来查找每个分组行的总数,如下所示:
data %>%
group_by(x) %>%
summarise(total = sum(dollars))
x total
<chr> <dbl>
1 expense 1 3600
2 expense 2 2150
3 expense 3 2000
但现在我需要为每个总行的相对频率创建一个新列以获得此结果:
x total p
<chr> <dbl> <dbl>
1 expense 1 3600 46.45%
2 expense 2 2150 27.74%
3 expense 3 2000 25.81%
我试过这个:
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = scales::percent(total/sum(total))
还有这个:
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/sum(total)*100)
但结果总是这样:
x total p
<chr> <dbl> <dbl>
1 expense 1 3600 100%
2 expense 2 2150 100%
3 expense 3 2000 100%
问题似乎是可能影响结果的汇总总计列。有什么想法可以帮助我吗?谢谢
你得到 100%,因为它计算了该特定组的总数。你需要取消分组。假设您想要除以 总条目数 只需除以 nrow(df)
.
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/nrow(data)*100)
你得到 100% 因为分组。但是,在您总结之后,dplyr 将放弃一级分组。意思是如果你例如做 mutate()
之后,你会得到你需要的结果:
library(dplyr)
data <- tibble(
x = c("expense 1", "expense 2", "expense 3"),
dollars = c(3600L, 2150L, 2000L)
)
data %>%
group_by(x) %>%
summarise(total = sum(dollars)) %>%
mutate(p = total/sum(total)*100)
# A tibble: 3 x 3
x total p
<chr> <int> <dbl>
1 expense 1 3600 46.5
2 expense 2 2150 27.7
3 expense 3 2000 25.8
在第一个 sum
之后,取消组合并创建 p
和 mutate
。
iris %>%
group_by(Species) %>%
summarise(total = sum(Sepal.Length)) %>%
ungroup() %>%
mutate(p = total/sum(total)*100)
## A tibble: 3 x 3
# Species total p
# <fct> <dbl> <dbl>
#1 setosa 250. 28.6
#2 versicolor 297. 33.9
#3 virginica 329. 37.6