如何在 R 中将多个种族列合并为一个?
How to combine multiple ethnicity columns into one in R?
我在我的数据集中使用来自 Qualtrics 调查的多答案种族问题,并希望将多列合并为一列。
我的数据是这样的:
White/Caucasian
Black/African American
Hispanic
Pacific Islander/Native Hawaiian
American Indian/Alaskan Native
1
-
1
-
-
1
-
-
-
-
-
-
-
-
1
-
-
-
1
-
我试图让它看起来像这样:
Race
Multiple
White
American Indian/Alaskan Native
Pacific Islander/Native Hawaiian
有没有办法在 R 中做到这一点?我已经为此工作了几个小时!
我们可以编写一个自定义函数来做到这一点 -
return_col <- function(x) {
inds <- x == 1
if(sum(inds) > 1) "Multiple" else names(df)[inds]
}
这可以在基础 R 中使用 -
df$Race <- apply(df, 1, return_col)
或在dplyr
library(dplyr)
df <- df %>%
rowwise() %>%
mutate(Race = return_col(c_across())) %>%
ungroup
df %>% select(Race)
# A tibble: 4 × 1
# Race
# <chr>
#1 Multiple
#2 White/Caucasian
#3 American Indian/Alaskan Native
#4 Pacific Islander/Native Hawaiian
数据
如果您在 reproducible format
中提供数据,会更容易提供帮助
df <- structure(list(`White/Caucasian` = c("1", "1", "-", "-"), `Black/African American` = c("-",
"-", "-", "-"), Hispanic = c("1", "-", "-", "-"), `Pacific Islander/Native Hawaiian` = c("-",
"-", "-", "1"), `American Indian/Alaskan Native` = c("-", "-",
"1", "-")), row.names = c(NA, -4L), class = "data.frame")
另一个tidyverse
选项:
library(tidyverse)
df %>%
mutate(id = row_number(),
across(everything(), ~ na_if(.x, "-"))) %>%
pivot_longer(-id, names_to = "Race", values_drop_na = TRUE) %>%
group_by(id) %>%
mutate(Race = ifelse(n() > 1, "Multiple", Race)) %>%
distinct() %>%
ungroup() %>%
select(Race)
输出
Race
<chr>
1 Multiple
2 White/Caucasian
3 American Indian/Alaskan Native
4 Pacific Islander/Native Hawaiian
数据
df <- structure(list(`White/Caucasian` = c("1", "1", "-", "-"), `Black/African American` = c("-",
"-", "-", "-"), Hispanic = c("1", "-", "-", "-"), `Pacific Islander/Native Hawaiian` = c("-",
"-", "-", "1"), `American Indian/Alaskan Native` = c("-", "-",
"1", "-")), row.names = c(NA, -4L), class = "data.frame")
我在我的数据集中使用来自 Qualtrics 调查的多答案种族问题,并希望将多列合并为一列。
我的数据是这样的:
White/Caucasian | Black/African American | Hispanic | Pacific Islander/Native Hawaiian | American Indian/Alaskan Native |
---|---|---|---|---|
1 | - | 1 | - | - |
1 | - | - | - | - |
- | - | - | - | 1 |
- | - | - | 1 | - |
我试图让它看起来像这样:
Race |
---|
Multiple |
White |
American Indian/Alaskan Native |
Pacific Islander/Native Hawaiian |
有没有办法在 R 中做到这一点?我已经为此工作了几个小时!
我们可以编写一个自定义函数来做到这一点 -
return_col <- function(x) {
inds <- x == 1
if(sum(inds) > 1) "Multiple" else names(df)[inds]
}
这可以在基础 R 中使用 -
df$Race <- apply(df, 1, return_col)
或在dplyr
library(dplyr)
df <- df %>%
rowwise() %>%
mutate(Race = return_col(c_across())) %>%
ungroup
df %>% select(Race)
# A tibble: 4 × 1
# Race
# <chr>
#1 Multiple
#2 White/Caucasian
#3 American Indian/Alaskan Native
#4 Pacific Islander/Native Hawaiian
数据
如果您在 reproducible format
中提供数据,会更容易提供帮助df <- structure(list(`White/Caucasian` = c("1", "1", "-", "-"), `Black/African American` = c("-",
"-", "-", "-"), Hispanic = c("1", "-", "-", "-"), `Pacific Islander/Native Hawaiian` = c("-",
"-", "-", "1"), `American Indian/Alaskan Native` = c("-", "-",
"1", "-")), row.names = c(NA, -4L), class = "data.frame")
另一个tidyverse
选项:
library(tidyverse)
df %>%
mutate(id = row_number(),
across(everything(), ~ na_if(.x, "-"))) %>%
pivot_longer(-id, names_to = "Race", values_drop_na = TRUE) %>%
group_by(id) %>%
mutate(Race = ifelse(n() > 1, "Multiple", Race)) %>%
distinct() %>%
ungroup() %>%
select(Race)
输出
Race
<chr>
1 Multiple
2 White/Caucasian
3 American Indian/Alaskan Native
4 Pacific Islander/Native Hawaiian
数据
df <- structure(list(`White/Caucasian` = c("1", "1", "-", "-"), `Black/African American` = c("-",
"-", "-", "-"), Hispanic = c("1", "-", "-", "-"), `Pacific Islander/Native Hawaiian` = c("-",
"-", "-", "1"), `American Indian/Alaskan Native` = c("-", "-",
"1", "-")), row.names = c(NA, -4L), class = "data.frame")