在 R 中按组汇总百分比

Summarize Percentage by Group in R

我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法解决这个问题。

我的数据:

Industry Male Female
Art/Entertainment 100 500
Banking 600 100
Healthcare 53 65
Education 20 766
Military 47 96
Medicine 500 400
Law 500 500
Computer 200 144
Sales 420 69

目标:

Industry Male Female F% M%
Art/Entertainment 100 500
Banking 600 100
Healthcare 53 65
Education 20 766
Military 47 96
Medicine 500 400
Law 500 500
Computer 200 144
Sales 420 69

如果您的数据名为 df,我们可以像这样为男性和女性百分比创建列:

df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)

注意,不要在变量名中使用 % 符号。

您可以使用 group_by,然后使用 F% 和 M% 创建两个新列

也许你可以使用这个:

library(dplyr)
df %>% group_by(Industry)  %>% mutate(F_prec=Female/(Male+Female), M_prec=Male/(Male+Female))

1) proportions 如果您的输入是 df1(在最后的注释中重复显示)然后将列名称更改为所需的名称并将其转换为矩阵 m。最后使用 proportions,边距为 1 表示行比例——2 表示列比例。请注意,我们在第一行转换为矩阵,因为 proportions 需要。

m <- as.matrix(setNames(df1[-1], c("%M", "%F")))
cbind(df1, 100 * proportions(m, 1))
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

2) rowSums 另一种方法是将 df1[-1] 除以 rowSums 给出相同的结果。

cbind(df1, setNames(100 * df1[-1] / rowSums(df1[-1]), c("%M", "%F")))
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

3) dplyr 使用 across 复制具有指定名称的列,然后将其乘以 100,然后使用 c_across 除按列的总和

df1 %>%
  group_by(Industry) %>%
  mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
  ungroup
## # A tibble: 9 x 5
##   Industry           Male Female `%Male` `%Female`
##   <chr>             <int>  <int>   <dbl>     <dbl>
## 1 Art/Entertainment   100    500   16.7       83.3
## 2 Banking             600    100   85.7       14.3
## 3 Healthcare           53     65   44.9       55.1
## ...snip...

4) transform 这个接近另一个答案但它不会覆盖输入:

transform(df1, 
  "%M" = 100 * Male / (Male + Female), 
  "%F" = 100 * Female / (Male + Female),  
  check.names = FALSE)
##            Industry Male Female        %M       %F
## 1 Art/Entertainment  100    500 16.666667 83.33333
## 2           Banking  600    100 85.714286 14.28571
## 3        Healthcare   53     65 44.915254 55.08475
## ...snip...

备注

以可复制的形式输入:

df1 <- structure(list(Industry = c("Art/Entertainment", "Banking", "Healthcare", 
"Education", "Military", "Medicine", "Law", "Computer", "Sales"
), Male = c(100L, 600L, 53L, 20L, 47L, 500L, 500L, 200L, 420L
), Female = c(500L, 100L, 65L, 766L, 96L, 400L, 500L, 144L, 69L
)), class = "data.frame", row.names = c(NA, -9L))

有一个简单的解决方案,使用用于交叉制表目的的库 janitor

library(janitor)

data %>% 
  adorn_totals(where = c("row","col")) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 0) %>% 
  adorn_ns(position = "front")

          Industry       Male     Female       Total
 Art/Entertainment  100 (17%)  500 (83%)  600 (100%)
           Banking  600 (86%)  100 (14%)  700 (100%)
        Healthcare   53 (45%)   65 (55%)  118 (100%)
         Education   20  (3%)  766 (97%)  786 (100%)
          Military   47 (33%)   96 (67%)  143 (100%)
          Medicine  500 (56%)  400 (44%)  900 (100%)
               Law  500 (50%)  500 (50%) 1000 (100%)
          Computer  200 (58%)  144 (42%)  344 (100%)
             Sales  420 (86%)   69 (14%)  489 (100%)
             Total 2440 (48%) 2640 (52%) 5080 (100%)

#OR

data %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2) %>% 
  adorn_ns(position = "front")

          Industry         Male       Female
 Art/Entertainment 100 (16.67%) 500 (83.33%)
           Banking 600 (85.71%) 100 (14.29%)
        Healthcare  53 (44.92%)  65 (55.08%)
         Education  20  (2.54%) 766 (97.46%)
          Military  47 (32.87%)  96 (67.13%)
          Medicine 500 (55.56%) 400 (44.44%)
               Law 500 (50.00%) 500 (50.00%)
          Computer 200 (58.14%) 144 (41.86%)
             Sales 420 (85.89%)  69 (14.11%)

使用的数据

> data
           Industry Male Female
1 Art/Entertainment  100    500
2           Banking  600    100
3        Healthcare   53     65
4         Education   20    766
5          Military   47     96
6          Medicine  500    400
7               Law  500    500
8          Computer  200    144
9             Sales  420     69