在 R 中按组汇总百分比

Question

我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法解决这个问题。

我的数据：

Industry	Male	Female
Art/Entertainment	100	500
Banking	600	100
Healthcare	53	65
Education	20	766
Military	47	96
Medicine	500	400
Law	500	500
Computer	200	144
Sales	420	69

目标：

Industry	Male	Female
Art/Entertainment	100	500
Banking	600	100
Healthcare	53	65
Education	20	766
Military	47	96
Medicine	500	400
Law	500	500
Computer	200	144
Sales	420	69

Answer 1

如果您的数据名为 df，我们可以像这样为男性和女性百分比创建列：

df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)

注意，不要在变量名中使用 % 符号。

Answer 2

您可以使用 group_by，然后使用 F% 和 M% 创建两个新列

也许你可以使用这个：

library(dplyr)
df %>% group_by(Industry)  %>% mutate(F_prec=Female/(Male+Female), M_prec=Male/(Male+Female))

Answer 3

1) proportions 如果您的输入是 df1（在最后的注释中重复显示）然后将列名称更改为所需的名称并将其转换为矩阵 m。最后使用 proportions，边距为 1 表示行比例——2 表示列比例。请注意，我们在第一行转换为矩阵，因为 proportions 需要。

m <- as.matrix(setNames(df1[-1], c("%M", "%F")))
cbind(df1, 100 * proportions(m, 1))
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

2) rowSums 另一种方法是将 df1[-1] 除以 rowSums 给出相同的结果。

cbind(df1, setNames(100 * df1[-1] / rowSums(df1[-1]), c("%M", "%F")))
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

3) dplyr 使用 across 复制具有指定名称的列，然后将其乘以 100，然后使用 c_across 除按列的总和

df1 %>%
  group_by(Industry) %>%
  mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
  ungroup
## # A tibble: 9 x 5
##   Industry           Male Female `%Male` `%Female`
##   <chr>             <int>  <int>   <dbl>     <dbl>
## 1 Art/Entertainment   100    500   16.7       83.3
## 2 Banking             600    100   85.7       14.3
## 3 Healthcare           53     65   44.9       55.1
## ...snip...

4) transform 这个接近另一个答案但它不会覆盖输入：

transform(df1, 
  "%M" = 100 * Male / (Male + Female), 
  "%F" = 100 * Female / (Male + Female),  
  check.names = FALSE)
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

备注

以可复制的形式输入：

df1 <- structure(list(Industry = c("Art/Entertainment", "Banking", "Healthcare", 
"Education", "Military", "Medicine", "Law", "Computer", "Sales"
), Male = c(100L, 600L, 53L, 20L, 47L, 500L, 500L, 200L, 420L
), Female = c(500L, 100L, 65L, 766L, 96L, 400L, 500L, 144L, 69L
)), class = "data.frame", row.names = c(NA, -9L))

Answer 4

有一个简单的解决方案，使用用于交叉制表目的的库 janitor

library(janitor)

data %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 0) %>% 
  adorn_ns(position = "front")

          Industry       Male     Female       Total
 Art/Entertainment  100 (17%)  500 (83%)  600 (100%)
           Banking  600 (86%)  100 (14%)  700 (100%)
        Healthcare   53 (45%)   65 (55%)  118 (100%)
         Education   20  (3%)  766 (97%)  786 (100%)
          Military   47 (33%)   96 (67%)  143 (100%)
          Medicine  500 (56%)  400 (44%)  900 (100%)
               Law  500 (50%)  500 (50%) 1000 (100%)
          Computer  200 (58%)  144 (42%)  344 (100%)
             Sales  420 (86%)   69 (14%)  489 (100%)
             Total 2440 (48%) 2640 (52%) 5080 (100%)

#OR

data %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2) %>% 
  adorn_ns(position = "front")

          Industry         Male       Female
 Art/Entertainment 100 (16.67%) 500 (83.33%)
           Banking 600 (85.71%) 100 (14.29%)
        Healthcare  53 (44.92%)  65 (55.08%)
         Education  20  (2.54%) 766 (97.46%)
          Military  47 (32.87%)  96 (67.13%)
          Medicine 500 (55.56%) 400 (44.44%)
               Law 500 (50.00%) 500 (50.00%)
          Computer 200 (58.14%) 144 (41.86%)
             Sales 420 (85.89%)  69 (14.11%)

使用的数据

> data
           Industry Male Female
1 Art/Entertainment  100    500
2           Banking  600    100
3        Healthcare   53     65
4         Education   20    766
5          Military   47     96
6          Medicine  500    400
7               Law  500    500
8          Computer  200    144
9             Sales  420     69

在 R 中按组汇总百分比

Summarize Percentage by Group in R

aggregate

r

percentage

备注