将具有多个种族类别响应的单元格重新编码为 "multiracial" 响应?
Recoding cells with multiple race category responses into a "multiracial" response?
我正在清理一些调查数据,这些数据似乎允许受访者 select 多个种族类别。我想知道如何将这些重新编码为“多种族”响应以供分析。
现在我一直在做相当费力的手工编码,但还没有成功。这是我尝试使用重新编码将具有多个条目的每个响应转换为一个数字,然后可以使用 case_when.
重新编码
rawdat$race <- recode(rawdat$race, "White, non-Hispanic,Asian" = 1,
"White, non-Hispanic,American Indian or Alaska Native" = 2,
"White, non-Hispanic,Black or African American,Asian" = 3,
"Black or African American,American Indian or Alaska Native" = 4,
"White, non-Hispanic,Hispanic" = 5,
"Asian,Native Hawaiian or Pacific Islander" = 6,
"White, non-Hispanic,Black or African American" = 7,
"Black or African American,American Indian or Alaska Native,Asian,Hispanic" = 8,
"White, non-Hispanic,Black or African American,American Indian or Alaska Native,Asian,Native Hawaiian or Pacific Islander,Hispanic" = 9,
"Black or African American,Hispanic" = 10,
"Black or African American,Asian" = 11,
"White, non-Hispanic,Native Hawaiian or Pacific Islander" =12,
"White, non-Hispanic,Black or African American,American Indian or Alaska Native,Asian,Hispanic",
"American Indian or Alaska Native,Hispanic" = 13)
这种方法有很多问题(我尝试它只是因为我认为它可以作为暴力短期修复 - 但事实并非如此),我更愿意初始化一个向量,其中包含针对此问题向受访者呈现的每个可能值,然后将包含多个这些值的任何单元格重新编码为值“多种族”,但据我所知,recode() 函数不会接受这样的一个向量作为参数。关于如何完成后一种方法有什么想法吗?
您要重新编码为“多种族”的单元格似乎包含一个逗号 - 对吗?如果是这样,你只需要用逗号识别单元格。
library(tidyverse)
race <- c("White", "White, Asian", "Black or African American", "White", "White, Black or African American")
df <- as.data.frame(race)
df$multiracial <- ifelse(grepl(",", df$race), "Multiracial", "Not multiracial")
df$race <- ifelse(df$multiracial == "Multiracial", "Multiracial", df$race)
head(df$race)
#> [1] "White" "Multiracial"
#> [3] "Black or African American" "White"
#> [5] "Multiracial"
编辑
创建了单独的 Hispanic/non-Hispanic 列。这可能无法在您的原始数据上正常工作,这取决于每个选择之间是否一致 spacing/commas。
library(tidyverse)
library(stringr)
race <- c("White, non-Hispanic",
"White, Asian",
"Black or African American",
"White, Hispanic, Asian",
"White, Black or African American",
"White, non-Hispanic",
"Black or African American, Hispanic")
df <- as.data.frame(race)
# original
df$original <- df$race
# create separate Hispanic/non-Hispanic column
df$hispanic <- ifelse(grepl("non-Hispanic",df$race),"non-Hispanic",
ifelse(grepl("Hispanic",df$race),"Hispanic", "Unknown"))
# remove Hispanic/non-Hispanic
df$race <- str_remove(df$race, ", non-Hispanic")
df$race <- str_remove(df$race, ", Hispanic")
# recode as multiracial
df$multiracial <- ifelse(grepl(",", df$race), "Multiracial", "Not multiracial")
df$race <- ifelse(df$multiracial == "Multiracial", "Multiracial", df$race)
head(df)
#> race original hispanic
#> 1 White White, non-Hispanic non-Hispanic
#> 2 Multiracial White, Asian Unknown
#> 3 Black or African American Black or African American Unknown
#> 4 Multiracial White, Hispanic, Asian Hispanic
#> 5 Multiracial White, Black or African American Unknown
#> 6 White White, non-Hispanic non-Hispanic
#> multiracial
#> 1 Not multiracial
#> 2 Multiracial
#> 3 Not multiracial
#> 4 Multiracial
#> 5 Multiracial
#> 6 Not multiracial
我正在清理一些调查数据,这些数据似乎允许受访者 select 多个种族类别。我想知道如何将这些重新编码为“多种族”响应以供分析。
现在我一直在做相当费力的手工编码,但还没有成功。这是我尝试使用重新编码将具有多个条目的每个响应转换为一个数字,然后可以使用 case_when.
重新编码rawdat$race <- recode(rawdat$race, "White, non-Hispanic,Asian" = 1,
"White, non-Hispanic,American Indian or Alaska Native" = 2,
"White, non-Hispanic,Black or African American,Asian" = 3,
"Black or African American,American Indian or Alaska Native" = 4,
"White, non-Hispanic,Hispanic" = 5,
"Asian,Native Hawaiian or Pacific Islander" = 6,
"White, non-Hispanic,Black or African American" = 7,
"Black or African American,American Indian or Alaska Native,Asian,Hispanic" = 8,
"White, non-Hispanic,Black or African American,American Indian or Alaska Native,Asian,Native Hawaiian or Pacific Islander,Hispanic" = 9,
"Black or African American,Hispanic" = 10,
"Black or African American,Asian" = 11,
"White, non-Hispanic,Native Hawaiian or Pacific Islander" =12,
"White, non-Hispanic,Black or African American,American Indian or Alaska Native,Asian,Hispanic",
"American Indian or Alaska Native,Hispanic" = 13)
这种方法有很多问题(我尝试它只是因为我认为它可以作为暴力短期修复 - 但事实并非如此),我更愿意初始化一个向量,其中包含针对此问题向受访者呈现的每个可能值,然后将包含多个这些值的任何单元格重新编码为值“多种族”,但据我所知,recode() 函数不会接受这样的一个向量作为参数。关于如何完成后一种方法有什么想法吗?
您要重新编码为“多种族”的单元格似乎包含一个逗号 - 对吗?如果是这样,你只需要用逗号识别单元格。
library(tidyverse)
race <- c("White", "White, Asian", "Black or African American", "White", "White, Black or African American")
df <- as.data.frame(race)
df$multiracial <- ifelse(grepl(",", df$race), "Multiracial", "Not multiracial")
df$race <- ifelse(df$multiracial == "Multiracial", "Multiracial", df$race)
head(df$race)
#> [1] "White" "Multiracial"
#> [3] "Black or African American" "White"
#> [5] "Multiracial"
编辑
创建了单独的 Hispanic/non-Hispanic 列。这可能无法在您的原始数据上正常工作,这取决于每个选择之间是否一致 spacing/commas。
library(tidyverse)
library(stringr)
race <- c("White, non-Hispanic",
"White, Asian",
"Black or African American",
"White, Hispanic, Asian",
"White, Black or African American",
"White, non-Hispanic",
"Black or African American, Hispanic")
df <- as.data.frame(race)
# original
df$original <- df$race
# create separate Hispanic/non-Hispanic column
df$hispanic <- ifelse(grepl("non-Hispanic",df$race),"non-Hispanic",
ifelse(grepl("Hispanic",df$race),"Hispanic", "Unknown"))
# remove Hispanic/non-Hispanic
df$race <- str_remove(df$race, ", non-Hispanic")
df$race <- str_remove(df$race, ", Hispanic")
# recode as multiracial
df$multiracial <- ifelse(grepl(",", df$race), "Multiracial", "Not multiracial")
df$race <- ifelse(df$multiracial == "Multiracial", "Multiracial", df$race)
head(df)
#> race original hispanic
#> 1 White White, non-Hispanic non-Hispanic
#> 2 Multiracial White, Asian Unknown
#> 3 Black or African American Black or African American Unknown
#> 4 Multiracial White, Hispanic, Asian Hispanic
#> 5 Multiracial White, Black or African American Unknown
#> 6 White White, non-Hispanic non-Hispanic
#> multiracial
#> 1 Not multiracial
#> 2 Multiracial
#> 3 Not multiracial
#> 4 Multiracial
#> 5 Multiracial
#> 6 Not multiracial