R:将二进制分类变量转换为长数据格式
R: Convert binary categorical variables to long data format
mydata <- structure(list(id = 1:10, cafe = c(0, 1, 0, 0, 1, 1, 0, 0, 1,
1), playground = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 0), classroom = c(0,
0, 0, 0, 0, 1, 1, 1, 1, 1), gender = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor"),
job = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("Student",
"Teacher"), class = "factor")), .Names = c("id", "cafe",
"playground", "classroom", "gender", "job"), row.names = c(NA,
-10L), class = "data.frame")
> mydata
id cafe playground classroom gender job
1 1 0 1 0 Male Teacher
2 2 1 1 0 Male Student
3 3 0 1 0 Male Teacher
4 4 0 1 0 Male Student
5 5 1 1 0 Male Teacher
6 6 1 1 1 Male Teacher
7 7 0 0 1 Female Teacher
8 8 0 1 1 Male Teacher
9 9 1 1 1 Female Teacher
10 10 1 0 1 Male Student
我想要的长格式数据集应该如下所示:
id response gender job
1 playground Male Teacher
2 cafe Male Student
2 playground Male Student
3 playground Male Teacher
...
基本上,response
列对应于 cafe、playground 和 classroom 列中值为 1 的列。我研究了几个示例 here and here,但它们不处理与二进制数据列。
这可以通过使用 reshape
包中的 melt(data, ...)
函数来完成。
library(reshape)
首先,我们将要保留的变量分配为列。
id <- c("id", "gender", "job")
然后,我们将宽格式更改为长格式,并仅保留包含 1
.
的行
df <- melt(mydata, id=id)
df[df[,5]==1,-5]
然后,按id
排序数据。
df <- df[order(df[,"id"]),]
最后,我们更改列名并重新排列列。
colnames(df)[4] <- "response"
df <- df[,c(1,4,2,3)]
## id response gender job
## 1 playground Male Teacher
## 2 cafe Male Student
## 2 playground Male Student
## 3 playground Male Teacher
## ...
## ...
## 9 classroom Female Teacher
## 10 cafe Male Student
## 10 classroom Male Student
我们可以使用 tidyverse
library(tidyverse)
mydata %>%
gather(response, value, cafe:classroom) %>%
filter(value==1) %>%
select(id, response, gender, job)
mydata <- structure(list(id = 1:10, cafe = c(0, 1, 0, 0, 1, 1, 0, 0, 1,
1), playground = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 0), classroom = c(0,
0, 0, 0, 0, 1, 1, 1, 1, 1), gender = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor"),
job = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("Student",
"Teacher"), class = "factor")), .Names = c("id", "cafe",
"playground", "classroom", "gender", "job"), row.names = c(NA,
-10L), class = "data.frame")
> mydata
id cafe playground classroom gender job
1 1 0 1 0 Male Teacher
2 2 1 1 0 Male Student
3 3 0 1 0 Male Teacher
4 4 0 1 0 Male Student
5 5 1 1 0 Male Teacher
6 6 1 1 1 Male Teacher
7 7 0 0 1 Female Teacher
8 8 0 1 1 Male Teacher
9 9 1 1 1 Female Teacher
10 10 1 0 1 Male Student
我想要的长格式数据集应该如下所示:
id response gender job
1 playground Male Teacher
2 cafe Male Student
2 playground Male Student
3 playground Male Teacher
...
基本上,response
列对应于 cafe、playground 和 classroom 列中值为 1 的列。我研究了几个示例 here and here,但它们不处理与二进制数据列。
这可以通过使用 reshape
包中的 melt(data, ...)
函数来完成。
library(reshape)
首先,我们将要保留的变量分配为列。
id <- c("id", "gender", "job")
然后,我们将宽格式更改为长格式,并仅保留包含 1
.
df <- melt(mydata, id=id)
df[df[,5]==1,-5]
然后,按id
排序数据。
df <- df[order(df[,"id"]),]
最后,我们更改列名并重新排列列。
colnames(df)[4] <- "response"
df <- df[,c(1,4,2,3)]
## id response gender job
## 1 playground Male Teacher
## 2 cafe Male Student
## 2 playground Male Student
## 3 playground Male Teacher
## ...
## ...
## 9 classroom Female Teacher
## 10 cafe Male Student
## 10 classroom Male Student
我们可以使用 tidyverse
library(tidyverse)
mydata %>%
gather(response, value, cafe:classroom) %>%
filter(value==1) %>%
select(id, response, gender, job)