在 R 中按组汇总百分比
Summarize Percentage by Group in R
我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法解决这个问题。
我的数据:
Industry
Male
Female
Art/Entertainment
100
500
Banking
600
100
Healthcare
53
65
Education
20
766
Military
47
96
Medicine
500
400
Law
500
500
Computer
200
144
Sales
420
69
目标:
Industry
Male
Female
F%
M%
Art/Entertainment
100
500
Banking
600
100
Healthcare
53
65
Education
20
766
Military
47
96
Medicine
500
400
Law
500
500
Computer
200
144
Sales
420
69
如果您的数据名为 df
,我们可以像这样为男性和女性百分比创建列:
df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)
注意,不要在变量名中使用 %
符号。
您可以使用 group_by
,然后使用 F% 和 M% 创建两个新列
也许你可以使用这个:
library(dplyr)
df %>% group_by(Industry) %>% mutate(F_prec=Female/(Male+Female), M_prec=Male/(Male+Female))
1) proportions 如果您的输入是 df1
(在最后的注释中重复显示)然后将列名称更改为所需的名称并将其转换为矩阵 m
。最后使用 proportions
,边距为 1 表示行比例——2 表示列比例。请注意,我们在第一行转换为矩阵,因为 proportions
需要。
m <- as.matrix(setNames(df1[-1], c("%M", "%F")))
cbind(df1, 100 * proportions(m, 1))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
2) rowSums 另一种方法是将 df1[-1] 除以 rowSums 给出相同的结果。
cbind(df1, setNames(100 * df1[-1] / rowSums(df1[-1]), c("%M", "%F")))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
3) dplyr 使用 across
复制具有指定名称的列,然后将其乘以 100,然后使用 c_across
除按列的总和
df1 %>%
group_by(Industry) %>%
mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
ungroup
## # A tibble: 9 x 5
## Industry Male Female `%Male` `%Female`
## <chr> <int> <int> <dbl> <dbl>
## 1 Art/Entertainment 100 500 16.7 83.3
## 2 Banking 600 100 85.7 14.3
## 3 Healthcare 53 65 44.9 55.1
## ...snip...
4) transform 这个接近另一个答案但它不会覆盖输入:
transform(df1,
"%M" = 100 * Male / (Male + Female),
"%F" = 100 * Female / (Male + Female),
check.names = FALSE)
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
备注
以可复制的形式输入:
df1 <- structure(list(Industry = c("Art/Entertainment", "Banking", "Healthcare",
"Education", "Military", "Medicine", "Law", "Computer", "Sales"
), Male = c(100L, 600L, 53L, 20L, 47L, 500L, 500L, 200L, 420L
), Female = c(500L, 100L, 65L, 766L, 96L, 400L, 500L, 144L, 69L
)), class = "data.frame", row.names = c(NA, -9L))
有一个简单的解决方案,使用用于交叉制表目的的库 janitor
library(janitor)
data %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
Industry Male Female Total
Art/Entertainment 100 (17%) 500 (83%) 600 (100%)
Banking 600 (86%) 100 (14%) 700 (100%)
Healthcare 53 (45%) 65 (55%) 118 (100%)
Education 20 (3%) 766 (97%) 786 (100%)
Military 47 (33%) 96 (67%) 143 (100%)
Medicine 500 (56%) 400 (44%) 900 (100%)
Law 500 (50%) 500 (50%) 1000 (100%)
Computer 200 (58%) 144 (42%) 344 (100%)
Sales 420 (86%) 69 (14%) 489 (100%)
Total 2440 (48%) 2640 (52%) 5080 (100%)
#OR
data %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns(position = "front")
Industry Male Female
Art/Entertainment 100 (16.67%) 500 (83.33%)
Banking 600 (85.71%) 100 (14.29%)
Healthcare 53 (44.92%) 65 (55.08%)
Education 20 (2.54%) 766 (97.46%)
Military 47 (32.87%) 96 (67.13%)
Medicine 500 (55.56%) 400 (44.44%)
Law 500 (50.00%) 500 (50.00%)
Computer 200 (58.14%) 144 (41.86%)
Sales 420 (85.89%) 69 (14.11%)
使用的数据
> data
Industry Male Female
1 Art/Entertainment 100 500
2 Banking 600 100
3 Healthcare 53 65
4 Education 20 766
5 Military 47 96
6 Medicine 500 400
7 Law 500 500
8 Computer 200 144
9 Sales 420 69
我希望按行业的女性和男性百分比来总结我的数据集。我仍在学习 R 并且无法解决这个问题。
我的数据:
Industry | Male | Female |
---|---|---|
Art/Entertainment | 100 | 500 |
Banking | 600 | 100 |
Healthcare | 53 | 65 |
Education | 20 | 766 |
Military | 47 | 96 |
Medicine | 500 | 400 |
Law | 500 | 500 |
Computer | 200 | 144 |
Sales | 420 | 69 |
目标:
Industry | Male | Female | F% | M% |
---|---|---|---|---|
Art/Entertainment | 100 | 500 | ||
Banking | 600 | 100 | ||
Healthcare | 53 | 65 | ||
Education | 20 | 766 | ||
Military | 47 | 96 | ||
Medicine | 500 | 400 | ||
Law | 500 | 500 | ||
Computer | 200 | 144 | ||
Sales | 420 | 69 |
如果您的数据名为 df
,我们可以像这样为男性和女性百分比创建列:
df$Fpct <- df$Female / (df$Male + df$Female)
df$Mpct <- df$Male / (df$Male + df$Female)
注意,不要在变量名中使用 %
符号。
您可以使用 group_by
,然后使用 F% 和 M% 创建两个新列
也许你可以使用这个:
library(dplyr)
df %>% group_by(Industry) %>% mutate(F_prec=Female/(Male+Female), M_prec=Male/(Male+Female))
1) proportions 如果您的输入是 df1
(在最后的注释中重复显示)然后将列名称更改为所需的名称并将其转换为矩阵 m
。最后使用 proportions
,边距为 1 表示行比例——2 表示列比例。请注意,我们在第一行转换为矩阵,因为 proportions
需要。
m <- as.matrix(setNames(df1[-1], c("%M", "%F")))
cbind(df1, 100 * proportions(m, 1))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
2) rowSums 另一种方法是将 df1[-1] 除以 rowSums 给出相同的结果。
cbind(df1, setNames(100 * df1[-1] / rowSums(df1[-1]), c("%M", "%F")))
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
3) dplyr 使用 across
复制具有指定名称的列,然后将其乘以 100,然后使用 c_across
除按列的总和
df1 %>%
group_by(Industry) %>%
mutate(100 * across(.names = "%{.col}") / sum(c_across())) %>%
ungroup
## # A tibble: 9 x 5
## Industry Male Female `%Male` `%Female`
## <chr> <int> <int> <dbl> <dbl>
## 1 Art/Entertainment 100 500 16.7 83.3
## 2 Banking 600 100 85.7 14.3
## 3 Healthcare 53 65 44.9 55.1
## ...snip...
4) transform 这个接近另一个答案但它不会覆盖输入:
transform(df1,
"%M" = 100 * Male / (Male + Female),
"%F" = 100 * Female / (Male + Female),
check.names = FALSE)
## Industry Male Female %M %F
## 1 Art/Entertainment 100 500 16.666667 83.33333
## 2 Banking 600 100 85.714286 14.28571
## 3 Healthcare 53 65 44.915254 55.08475
## ...snip...
备注
以可复制的形式输入:
df1 <- structure(list(Industry = c("Art/Entertainment", "Banking", "Healthcare",
"Education", "Military", "Medicine", "Law", "Computer", "Sales"
), Male = c(100L, 600L, 53L, 20L, 47L, 500L, 500L, 200L, 420L
), Female = c(500L, 100L, 65L, 766L, 96L, 400L, 500L, 144L, 69L
)), class = "data.frame", row.names = c(NA, -9L))
有一个简单的解决方案,使用用于交叉制表目的的库 janitor
library(janitor)
data %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
Industry Male Female Total
Art/Entertainment 100 (17%) 500 (83%) 600 (100%)
Banking 600 (86%) 100 (14%) 700 (100%)
Healthcare 53 (45%) 65 (55%) 118 (100%)
Education 20 (3%) 766 (97%) 786 (100%)
Military 47 (33%) 96 (67%) 143 (100%)
Medicine 500 (56%) 400 (44%) 900 (100%)
Law 500 (50%) 500 (50%) 1000 (100%)
Computer 200 (58%) 144 (42%) 344 (100%)
Sales 420 (86%) 69 (14%) 489 (100%)
Total 2440 (48%) 2640 (52%) 5080 (100%)
#OR
data %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns(position = "front")
Industry Male Female
Art/Entertainment 100 (16.67%) 500 (83.33%)
Banking 600 (85.71%) 100 (14.29%)
Healthcare 53 (44.92%) 65 (55.08%)
Education 20 (2.54%) 766 (97.46%)
Military 47 (32.87%) 96 (67.13%)
Medicine 500 (55.56%) 400 (44.44%)
Law 500 (50.00%) 500 (50.00%)
Computer 200 (58.14%) 144 (41.86%)
Sales 420 (85.89%) 69 (14.11%)
使用的数据
> data
Industry Male Female
1 Art/Entertainment 100 500
2 Banking 600 100
3 Healthcare 53 65
4 Education 20 766
5 Military 47 96
6 Medicine 500 400
7 Law 500 500
8 Computer 200 144
9 Sales 420 69