如何将 15 个唯一变量提炼成 R 中的 4 个选定变量?
How to distill 15 unique variables into 4 chosen variables in R?
我有一个 df:
Weight Age Race
56 10 WHITE - RUSSIAN
190 54 HISPANIC/LATINO - CUBAN
99 14 SOUTH AMERICAN
80 9 BLACK/AFRICAN
200 19 ASIAN - CHINESE
201 20 ASIAN
180 90 WHITE
17 2 UNKNOWN/NOT SPECIFIED
100 10 BLACK/CAPE VERDEAN
110 11
109 9 AMERICAN INDIAN/ALASKA NATIVE
比赛类别有 15 个独特的选项,输出为 unique(df$Race):
[1] WHITE
[2] WHITE - RUSSIAN
[3] ASIAN
[4] BLACK/AFRICAN AMERICAN
[5] OTHER
[6] UNKNOWN/NOT SPECIFIED
[7] BLACK/AFRICAN
[8] HISPANIC/LATINO - CUBAN
[9] WHITE - OTHER EUROPEAN
[10] AMERICAN INDIAN/ALASKA NATIVE
[11] SOUTH AMERICAN
[12] ASIAN - CHINESE
[13] BLACK/CAPE VERDEAN
[14] HISPANIC/LATINO - PUERTO RICAN
[15]
我想将这些更改为五个桶:“白人”为 [1,2,9],“黑人”为 [4,7,13],“西班牙裔”为 [8,11,14 ]、“亚洲”与 [3,12] 和“其他”与 [5,6,10]。如果它是空白的,我希望它保持空白。
我希望输出为:
Weight Age Race
56 10 White
190 54 Hispanic
99 14 Hispanic
80 9 Black
200 19 Asian
201 20 Asian
180 90 White
17 2 Other
100 10 Black
110 11
109 9 Other
可以用case_when
分门别类-
library(dplyr)
df %>%
mutate(Race = case_when(
grepl('WHITE', Race) ~ 'White',
grepl('BLACK', Race) ~ 'Black',
grepl('ASIAN', Race) ~ 'Asian',
Race %in% c('HISPANIC/LATINO - CUBAN', 'SOUTH AMERICAN', 'HISPANIC/LATINO - PUERTO RICAN') ~ 'Hispanic',
Race == '' ~ '',
TRUE ~ 'Other'))
# Weight Age Race
#1 56 10 White
#2 190 54 Hispanic
#3 99 14 Hispanic
#4 80 9 Black
#5 200 19 Asian
#6 201 20 Asian
#7 180 90 White
#8 17 2 Other
#9 100 10 Black
#10 110 11
#11 109 9 Other
如果比赛包含文本 'WHITE'
,我们将比赛更改为 'White'
,'Black'
和 'Asian'
也是如此。对于其他类别,我们可以单独列出种族值以将它们组合起来。
您也可以使用 fct_collapse
from forcats
单独列出它们。
df %>%
mutate(Race = forcats::fct_collapse(Race, White = c('WHITE', 'WHITE - RUSSIAN', 'WHITE - OTHER EUROPEAN'),
Black = c('BLACK/AFRICAN AMERICAN', 'BLACK/AFRICAN', 'BLACK/CAPE VERDEAN'),
Hispanic = c('HISPANIC/LATINO - CUBAN', 'SOUTH AMERICAN', 'HISPANIC/LATINO - PUERTO RICAN'),
Asian = c('ASIAN', 'ASIAN - CHINESE'),
Other = c('OTHER', 'UNKNOWN/NOT SPECIFIED', 'AMERICAN INDIAN/ALASKA NATIVE')))
我有一个 df:
Weight Age Race
56 10 WHITE - RUSSIAN
190 54 HISPANIC/LATINO - CUBAN
99 14 SOUTH AMERICAN
80 9 BLACK/AFRICAN
200 19 ASIAN - CHINESE
201 20 ASIAN
180 90 WHITE
17 2 UNKNOWN/NOT SPECIFIED
100 10 BLACK/CAPE VERDEAN
110 11
109 9 AMERICAN INDIAN/ALASKA NATIVE
比赛类别有 15 个独特的选项,输出为 unique(df$Race):
[1] WHITE
[2] WHITE - RUSSIAN
[3] ASIAN
[4] BLACK/AFRICAN AMERICAN
[5] OTHER
[6] UNKNOWN/NOT SPECIFIED
[7] BLACK/AFRICAN
[8] HISPANIC/LATINO - CUBAN
[9] WHITE - OTHER EUROPEAN
[10] AMERICAN INDIAN/ALASKA NATIVE
[11] SOUTH AMERICAN
[12] ASIAN - CHINESE
[13] BLACK/CAPE VERDEAN
[14] HISPANIC/LATINO - PUERTO RICAN
[15]
我想将这些更改为五个桶:“白人”为 [1,2,9],“黑人”为 [4,7,13],“西班牙裔”为 [8,11,14 ]、“亚洲”与 [3,12] 和“其他”与 [5,6,10]。如果它是空白的,我希望它保持空白。
我希望输出为:
Weight Age Race
56 10 White
190 54 Hispanic
99 14 Hispanic
80 9 Black
200 19 Asian
201 20 Asian
180 90 White
17 2 Other
100 10 Black
110 11
109 9 Other
可以用case_when
分门别类-
library(dplyr)
df %>%
mutate(Race = case_when(
grepl('WHITE', Race) ~ 'White',
grepl('BLACK', Race) ~ 'Black',
grepl('ASIAN', Race) ~ 'Asian',
Race %in% c('HISPANIC/LATINO - CUBAN', 'SOUTH AMERICAN', 'HISPANIC/LATINO - PUERTO RICAN') ~ 'Hispanic',
Race == '' ~ '',
TRUE ~ 'Other'))
# Weight Age Race
#1 56 10 White
#2 190 54 Hispanic
#3 99 14 Hispanic
#4 80 9 Black
#5 200 19 Asian
#6 201 20 Asian
#7 180 90 White
#8 17 2 Other
#9 100 10 Black
#10 110 11
#11 109 9 Other
如果比赛包含文本 'WHITE'
,我们将比赛更改为 'White'
,'Black'
和 'Asian'
也是如此。对于其他类别,我们可以单独列出种族值以将它们组合起来。
您也可以使用 fct_collapse
from forcats
单独列出它们。
df %>%
mutate(Race = forcats::fct_collapse(Race, White = c('WHITE', 'WHITE - RUSSIAN', 'WHITE - OTHER EUROPEAN'),
Black = c('BLACK/AFRICAN AMERICAN', 'BLACK/AFRICAN', 'BLACK/CAPE VERDEAN'),
Hispanic = c('HISPANIC/LATINO - CUBAN', 'SOUTH AMERICAN', 'HISPANIC/LATINO - PUERTO RICAN'),
Asian = c('ASIAN', 'ASIAN - CHINESE'),
Other = c('OTHER', 'UNKNOWN/NOT SPECIFIED', 'AMERICAN INDIAN/ALASKA NATIVE')))