如何创建一个以其他数据集变量为级别的变量

How to create a variable with other dataset variables as its levels

我有一个数据集,其中几个变量被二分为 yes/no。

> df[1:20,]
# A tibble: 20 × 2
   black white
   <fct> <fct>
 1 No    Yes  
 2 No    Yes  
 3 No    Yes  
 4 No    Yes  
 5 No    Yes  
 6 No    Yes  
 7 No    Yes  
 8 No    Yes  
 9 No    Yes  
10 No    Yes  
11 No    Yes  
12 No    Yes  
13 No    Yes  
14 No    Yes  
15 No    Yes  
16 Yes   No   
17 No    Yes  
18 No    Yes  
19 No    Yes  
20 Yes   No 

这会产生很多变量(我的真实数据有多个种族选项)并且看起来不太整洁,因为它意味着很多不必要的变量。 我想创建一个新变量(例如 'race'),其中现在的各个变量 'black'、'white' 等是该变量的级别。 就像这个例子

> df2[1:20,]
# A tibble: 20 × 1
   race 
   <fct>
 1 White
 2 White
 3 White
 4 White
 5 White
 6 White
 7 White
 8 White
 9 White
10 White
11 White
12 White
13 White
14 White
15 White
16 Black
17 White
18 White
19 White
20 Black

我该怎么做?

要考虑多个种族,请在行上使用应用 (MARGIN = 1),并使用 "Yes":

粘贴 toString 列名称
df <- structure(list(asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

data.frame(race = apply(df == "Yes", 1, \(x) toString(colnames(df)[which(x)])))

           race
1         white
2         white
3         white
4         white
5         white
6         white
7         white
8         white
9         white
10        white
11        white
12        white
13        white
14        white
15        white
16        black
17        white
18        white
19        white
20 asian, black

使用max.col(每个人仅适用于一个值):

data.frame(race = colnames(df)[max.col(df == "Yes")])

使用 dplyr(假设在您的数据集中一个人只能属于 1 个种族):

library(dplyr)

dat <- data.frame(id = 1:2,
                  black = c("No", "Yes"),
                  white = c("Yes", "No"))

dat |> mutate(
        race = case_when(black == "Yes" ~ "black",
                         white == "Yes" ~ "white")
)

输出:

#>   id black white  race
#> 1  1    No   Yes white
#> 2  2   Yes    No black

这是一个适用于多种族案例的解决方案。

library(tidyverse)

# Sample data with multiracial case
df <- structure(list(respondent = 1:20, asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

df %>%
  select(asian:white) %>%
  `==`("Yes") %>%
  apply(1, 
        \(.row) colnames(.)[.row] %>%
          str_c(collapse = "-")) 
#>  [1] "white"       "white"       "white"       "white"       "white"      
#>  [6] "white"       "white"       "white"       "white"       "white"      
#> [11] "white"       "white"       "white"       "white"       "white"      
#> [16] "black"       "white"       "white"       "white"       "asian-black"

reprex package (v2.0.1)

于 2022-04-04 创建