根据组 R 计算数据框中的百分比

Calculate Percentage in a dataframe according to groups R

我有以下数据框,想根据 Stage 和 Category 计算百分比。我的一些其他数据有另一个变量,例如年。我需要数据帧上的输出才能使用 ggplot2。

Gender = rep(c("Female", "Male"), 6)
Stage = rep(c("Applied", "Appointed", "Interviewed"), each=2, times = 2)
Category = rep(c("Professional", "Research"), each = 6)
Count = as.integer(c("346", "251", "22", "15", "60", "52", "31", "230", "4", "17", "9", "52"))
df = data.frame(Gender, Stage, Category,Count )

我写的(可怕的)代码在某些情况下有效,但如果数据结构发生变化,例如0 计数的女性,代码将不起作用。

totals = aggregate(df$Count, by = list(Stage = df$Stage, Category = df$Category),sum)
totals = rep( totals$x, each = 2)
df$Percentage = round(df$Count/totals, 2)

这是我想要的输出:

   Gender       Stage     Category Count Percentage
1  Female     Applied Professional   346       0.58
2    Male     Applied Professional   251       0.42
3  Female   Appointed Professional    22       0.59
4    Male   Appointed Professional    15       0.41
5  Female Interviewed Professional    60       0.54
6    Male Interviewed Professional    52       0.46
7  Female     Applied     Research    31       0.12
8    Male     Applied     Research   230       0.88
9  Female   Appointed     Research     4       0.19
10   Male   Appointed     Research    17       0.81
11 Female Interviewed     Research     9       0.15
12   Male Interviewed     Research    52       0.85

感谢您的帮助!

我建议使用 data.table 包。在那里你可以写这样的东西:

library(data.table)
dt[,Percentage := round(Count / sum(Count), 2), by=c("Stage", "Category")]

我建议使用 data.table 包的原因是它是 data.frames 最快的包之一。一般来说,标准数据帧是相当糟糕的。

与 dplyr 相比,data.table 速度更快,但没有到 SQL 数据库的透明接口。

data.table中的速度主要是通过数据转换中的零拷贝实现的。

这是manual

我们可以使用dplyr

library(dplyr)
df %>% 
   group_by(Stage, Category) %>%
   mutate(Percentage = round(Count/sum(Count), 2))

我们可以使用ave函数:

df$Percentage <- df$Count / ave(df$Count, df$Stage, df$Category, FUN = sum)

   Gender       Stage     Category Count Percentage
1  Female     Applied Professional   346  0.5795645
2    Male     Applied Professional   251  0.4204355
3  Female   Appointed Professional    22  0.5945946
4    Male   Appointed Professional    15  0.4054054
5  Female Interviewed Professional    60  0.5357143
6    Male Interviewed Professional    52  0.4642857
7  Female     Applied     Research    31  0.1187739
8    Male     Applied     Research   230  0.8812261
9  Female   Appointed     Research     4  0.1904762
10   Male   Appointed     Research    17  0.8095238
11 Female Interviewed     Research     9  0.1475410
12   Male Interviewed     Research    52  0.8524590