R - 基于条件的聚合分母,用于所有行的百分比计算
R - Aggregate denominator based on condition, for use in percentage calculation for all rows
我有这样的数据:
population <- c(101:110)
coverage <- c(91:100)
area <- c("Cambridge", "Cambridge","Cambridge", "Cambridge","Cambridge", "Oxford", "Oxford","Oxford", "Oxford","Oxford")
all <- data.frame(population,coverage,area)
然后我想要一段简洁的 R 代码来计算移动覆盖区域内的人口百分比。我知道是这样的(但不是这个):
coverage <- population x (coverage/100) / (aggregate(population, by=area, FUN=sum))
如何按地区计算人口总和,以用作所有行百分比计算中的分母?通常我会使用 aggregate 按地区获取人口,然后将其合并回数据框以用作分母,但这根本不是很优雅。我希望数据最终看起来像这样:
population <- c(101:110)
coverage <- c(91:100)
area <- c("Cambridge", "Cambridge","Cambridge", "Cambridge","Cambridge", "Oxford", "Oxford","Oxford", "Oxford","Oxford")
percentage <- c(18, 18, 18, 18, 18, 19, 19, 19, 19, 19)
all <- data.frame(population,coverage,area, percentage)
非常感谢您的帮助。
您可以使用 dplyr
按 area
:
对计算进行分组
library(dplyr)
all %>% group_by(area) %>% mutate(percentage=population*(coverage/100)/sum(population))
##Source: local data frame [10 x 4]
##Groups: area [2]
##
## population coverage area percentage
## <int> <int> <fctr> <dbl>
##1 101 91 Cambridge 0.1784660
##2 102 92 Cambridge 0.1822136
##3 103 93 Cambridge 0.1860000
##4 104 94 Cambridge 0.1898252
##5 105 95 Cambridge 0.1936893
##6 106 96 Oxford 0.1884444
##7 107 97 Oxford 0.1922037
##8 108 98 Oxford 0.1960000
##9 109 99 Oxford 0.1998333
##10 110 100 Oxford 0.2037037
我想你想要 dplyr summarize 为此。
这是否达到你想要的效果?
图书馆(dplyr)
all %>% group_by(area) %>% summarise(coveragePct=sum(覆盖率)/sum(人口))
你可以用 dplyr 做到这一点:
all.summary <- all %>%
group_by(area) %>%
mutate(percentage = population/sum(population)*(coverage/100))
all.summary
population coverage area percentage
<int> <int> <fctr> <dbl>
1 101 91 Cambridge 0.1784660
2 102 92 Cambridge 0.1822136
3 103 93 Cambridge 0.1860000
4 104 94 Cambridge 0.1898252
5 105 95 Cambridge 0.1936893
6 106 96 Oxford 0.1884444
7 107 97 Oxford 0.1922037
8 108 98 Oxford 0.1960000
9 109 99 Oxford 0.1998333
10 110 100 Oxford 0.2037037
我有这样的数据:
population <- c(101:110)
coverage <- c(91:100)
area <- c("Cambridge", "Cambridge","Cambridge", "Cambridge","Cambridge", "Oxford", "Oxford","Oxford", "Oxford","Oxford")
all <- data.frame(population,coverage,area)
然后我想要一段简洁的 R 代码来计算移动覆盖区域内的人口百分比。我知道是这样的(但不是这个):
coverage <- population x (coverage/100) / (aggregate(population, by=area, FUN=sum))
如何按地区计算人口总和,以用作所有行百分比计算中的分母?通常我会使用 aggregate 按地区获取人口,然后将其合并回数据框以用作分母,但这根本不是很优雅。我希望数据最终看起来像这样:
population <- c(101:110)
coverage <- c(91:100)
area <- c("Cambridge", "Cambridge","Cambridge", "Cambridge","Cambridge", "Oxford", "Oxford","Oxford", "Oxford","Oxford")
percentage <- c(18, 18, 18, 18, 18, 19, 19, 19, 19, 19)
all <- data.frame(population,coverage,area, percentage)
非常感谢您的帮助。
您可以使用 dplyr
按 area
:
library(dplyr)
all %>% group_by(area) %>% mutate(percentage=population*(coverage/100)/sum(population))
##Source: local data frame [10 x 4]
##Groups: area [2]
##
## population coverage area percentage
## <int> <int> <fctr> <dbl>
##1 101 91 Cambridge 0.1784660
##2 102 92 Cambridge 0.1822136
##3 103 93 Cambridge 0.1860000
##4 104 94 Cambridge 0.1898252
##5 105 95 Cambridge 0.1936893
##6 106 96 Oxford 0.1884444
##7 107 97 Oxford 0.1922037
##8 108 98 Oxford 0.1960000
##9 109 99 Oxford 0.1998333
##10 110 100 Oxford 0.2037037
我想你想要 dplyr summarize 为此。
这是否达到你想要的效果?
图书馆(dplyr) all %>% group_by(area) %>% summarise(coveragePct=sum(覆盖率)/sum(人口))
你可以用 dplyr 做到这一点:
all.summary <- all %>%
group_by(area) %>%
mutate(percentage = population/sum(population)*(coverage/100))
all.summary
population coverage area percentage
<int> <int> <fctr> <dbl>
1 101 91 Cambridge 0.1784660
2 102 92 Cambridge 0.1822136
3 103 93 Cambridge 0.1860000
4 104 94 Cambridge 0.1898252
5 105 95 Cambridge 0.1936893
6 106 96 Oxford 0.1884444
7 107 97 Oxford 0.1922037
8 108 98 Oxford 0.1960000
9 109 99 Oxford 0.1998333
10 110 100 Oxford 0.2037037