R:将二进制分类变量转换为长数据格式

R: Convert binary categorical variables to long data format

mydata <- structure(list(id = 1:10, cafe = c(0, 1, 0, 0, 1, 1, 0, 0, 1, 
1), playground = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 0), classroom = c(0, 
0, 0, 0, 0, 1, 1, 1, 1, 1), gender = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor"), 
    job = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("Student", 
    "Teacher"), class = "factor")), .Names = c("id", "cafe", 
"playground", "classroom", "gender", "job"), row.names = c(NA, 
-10L), class = "data.frame")

> mydata
   id cafe playground classroom gender     job
1   1    0          1         0   Male Teacher
2   2    1          1         0   Male Student
3   3    0          1         0   Male Teacher
4   4    0          1         0   Male Student
5   5    1          1         0   Male Teacher
6   6    1          1         1   Male Teacher
7   7    0          0         1 Female Teacher
8   8    0          1         1   Male Teacher
9   9    1          1         1 Female Teacher
10 10    1          0         1   Male Student

我想要的长格式数据集应该如下所示:

id      response    gender        job
1     playground      Male    Teacher
2           cafe      Male    Student
2     playground      Male    Student
3     playground      Male    Teacher
...

基本上,response 列对应于 cafe、playground 和 classroom 列中值为 1 的列。我研究了几个示例 here and here,但它们不处理与二进制数据列。

这可以通过使用 reshape 包中的 melt(data, ...) 函数来完成。

library(reshape)

首先,我们将要保留的变量分配为列。

id <- c("id", "gender", "job")

然后,我们将宽格式更改为长格式,并仅保留包含 1.

的行
df <- melt(mydata, id=id)
df[df[,5]==1,-5]

然后,按id排序数据。

df <- df[order(df[,"id"]),]

最后,我们更改列名并重新排列列。

colnames(df)[4] <- "response"
df <- df[,c(1,4,2,3)]

## id   response  gender    job
## 1  playground   Male Teacher
## 2        cafe   Male Student
## 2  playground   Male Student
## 3  playground   Male Teacher
## ...
## ...
## 9   classroom Female Teacher
## 10       cafe   Male Student
## 10  classroom   Male Student

我们可以使用 tidyverse

library(tidyverse)
mydata %>%
    gather(response, value, cafe:classroom) %>% 
    filter(value==1) %>%
    select(id, response, gender, job)