如何在 R 中将多个种族列合并为一个?

How to combine multiple ethnicity columns into one in R?

我在我的数据集中使用来自 Qualtrics 调查的多答案种族问题,并希望将多列合并为一列。

我的数据是这样的:

White/Caucasian Black/African American Hispanic Pacific Islander/Native Hawaiian American Indian/Alaskan Native
1 - 1 - -
1 - - - -
- - - - 1
- - - 1 -

我试图让它看起来像这样:

Race
Multiple
White
American Indian/Alaskan Native
Pacific Islander/Native Hawaiian

有没有办法在 R 中做到这一点?我已经为此工作了几个小时!

我们可以编写一个自定义函数来做到这一点 -

return_col <- function(x) {
  inds <- x == 1
  if(sum(inds) > 1) "Multiple" else names(df)[inds]
}

这可以在基础 R 中使用 -

df$Race <- apply(df, 1, return_col)

或在dplyr

library(dplyr)

df <- df %>%
  rowwise() %>%
  mutate(Race = return_col(c_across())) %>%
  ungroup

df %>% select(Race)

# A tibble: 4 × 1
#  Race                            
#  <chr>                           
#1 Multiple                        
#2 White/Caucasian                 
#3 American Indian/Alaskan Native  
#4 Pacific Islander/Native Hawaiian

数据

如果您在 reproducible format

中提供数据,会更容易提供帮助
df <- structure(list(`White/Caucasian` = c("1", "1", "-", "-"), `Black/African American` = c("-", 
"-", "-", "-"), Hispanic = c("1", "-", "-", "-"), `Pacific Islander/Native Hawaiian` = c("-", 
"-", "-", "1"), `American Indian/Alaskan Native` = c("-", "-", 
"1", "-")), row.names = c(NA, -4L), class = "data.frame")

另一个tidyverse选项:

library(tidyverse)

df %>%
  mutate(id = row_number(),
         across(everything(), ~ na_if(.x, "-"))) %>%
  pivot_longer(-id, names_to = "Race", values_drop_na = TRUE) %>%
  group_by(id) %>%
  mutate(Race = ifelse(n() > 1, "Multiple", Race)) %>%
  distinct() %>% 
  ungroup() %>%
  select(Race)

输出

  Race                            
  <chr>                           
1 Multiple                        
2 White/Caucasian                 
3 American Indian/Alaskan Native  
4 Pacific Islander/Native Hawaiian

数据

df <- structure(list(`White/Caucasian` = c("1", "1", "-", "-"), `Black/African American` = c("-", 
"-", "-", "-"), Hispanic = c("1", "-", "-", "-"), `Pacific Islander/Native Hawaiian` = c("-", 
"-", "-", "1"), `American Indian/Alaskan Native` = c("-", "-", 
"1", "-")), row.names = c(NA, -4L), class = "data.frame")