如何将 comma-separated 多个响应转换为 R 中的虚拟编码列
How to convert comma-separated multiple responses into dummy coded columns in R
在一项调查中,有一个问题 "what aspect of the course helped you learn concepts the most? Select all that apply"
回复列表如下所示:
Student_ID = c(1,2,3)
Responses = c("lectures,tutorials","tutorials,assignments,lectures", "assignments,presentations,tutorials")
Grades = c(1.1,1.2,1.3)
Data = data.frame(Student_ID,Responses,Grades);Data
Student_ID | Responses | Grades
1 | lectures,tutorials | 1.1
2 | tutorials,assignments,lectures | 1.2
3 | assignments,presentations,tutorials | 1.3
现在我想创建一个看起来像这样的数据框
Student_ID | Lectures | Tutorials | Assignments | Presentation | Grades
1 | 1 | 1 | 0 | 0 | 1.3
2 | 1 | 1 | 1 | 0 | 1.4
3 | 0 | 1 | 1 | 1 | 1.3
我设法使用 splitstackshape 包将逗号分隔的响应分隔成列。所以目前我的数据是这样的:
Student ID | Response 1 | Response 2 | Response 3 | Response 4 | Grades
1 | lectures | tutorials | NA | NA | 1.1
2 | tutorials | assignments | lectures | NA | 1.2
3 | assignments| presentation| tutorials | NA | 1.3
但正如我之前所说,我希望我的 table 看起来像我上面介绍的那样,采用虚拟代码。我坚持如何进行。也许一个想法是通过列中的每个观察并将 1 或 0 附加到一个新的数据框,其中包含讲座、教程、作业、演示文稿作为 headers?
首先,Response
列从因子转换为字符 class。该列的每个元素然后以逗号分隔。我不知道所有可能的反应是什么,所以我使用了所有存在的反应。接下来拆分 Response
列被制成表格,指定可能的级别。结果列表在混合到旧的 data.frame.
之前被转换为矩阵
Data$Responses <- as.character(Data$Responses)
resp.split <- strsplit(Data$Responses, ",")
lev <- unique(unlist(resp.split))
resp.dummy <- lapply(resp.split, function(x) table(factor(x, levels=lev)))
Data2 <- with(Data, data.frame(Student_ID, do.call(rbind, resp.dummy), Grades))
Data2
# Student_ID lectures tutorials assignments presentations Grades
# 1 1 1 1 0 0 1.1
# 2 2 1 1 1 0 1.2
# 3 3 0 1 1 1 1.3
我找到了对我的问题的答复。我最初做了
library(splitstackshape)
Responses = cSplit(Data, "Responses",",")
然后我添加了以下行:
library(qdapTools)
TA <- mtabulate(as.data.frame(t(TA)))
它对我有用。
在一项调查中,有一个问题 "what aspect of the course helped you learn concepts the most? Select all that apply"
回复列表如下所示:
Student_ID = c(1,2,3)
Responses = c("lectures,tutorials","tutorials,assignments,lectures", "assignments,presentations,tutorials")
Grades = c(1.1,1.2,1.3)
Data = data.frame(Student_ID,Responses,Grades);Data
Student_ID | Responses | Grades
1 | lectures,tutorials | 1.1
2 | tutorials,assignments,lectures | 1.2
3 | assignments,presentations,tutorials | 1.3
现在我想创建一个看起来像这样的数据框
Student_ID | Lectures | Tutorials | Assignments | Presentation | Grades
1 | 1 | 1 | 0 | 0 | 1.3
2 | 1 | 1 | 1 | 0 | 1.4
3 | 0 | 1 | 1 | 1 | 1.3
我设法使用 splitstackshape 包将逗号分隔的响应分隔成列。所以目前我的数据是这样的:
Student ID | Response 1 | Response 2 | Response 3 | Response 4 | Grades
1 | lectures | tutorials | NA | NA | 1.1
2 | tutorials | assignments | lectures | NA | 1.2
3 | assignments| presentation| tutorials | NA | 1.3
但正如我之前所说,我希望我的 table 看起来像我上面介绍的那样,采用虚拟代码。我坚持如何进行。也许一个想法是通过列中的每个观察并将 1 或 0 附加到一个新的数据框,其中包含讲座、教程、作业、演示文稿作为 headers?
首先,Response
列从因子转换为字符 class。该列的每个元素然后以逗号分隔。我不知道所有可能的反应是什么,所以我使用了所有存在的反应。接下来拆分 Response
列被制成表格,指定可能的级别。结果列表在混合到旧的 data.frame.
Data$Responses <- as.character(Data$Responses)
resp.split <- strsplit(Data$Responses, ",")
lev <- unique(unlist(resp.split))
resp.dummy <- lapply(resp.split, function(x) table(factor(x, levels=lev)))
Data2 <- with(Data, data.frame(Student_ID, do.call(rbind, resp.dummy), Grades))
Data2
# Student_ID lectures tutorials assignments presentations Grades
# 1 1 1 1 0 0 1.1
# 2 2 1 1 1 0 1.2
# 3 3 0 1 1 1 1.3
我找到了对我的问题的答复。我最初做了
library(splitstackshape)
Responses = cSplit(Data, "Responses",",")
然后我添加了以下行:
library(qdapTools)
TA <- mtabulate(as.data.frame(t(TA)))
它对我有用。