如何将 comma-separated 多个响应转换为 R 中的虚拟编码列

How to convert comma-separated multiple responses into dummy coded columns in R

在一项调查中,有一个问题 "what aspect of the course helped you learn concepts the most? Select all that apply"

回复列表如下所示:

Student_ID = c(1,2,3)
Responses = c("lectures,tutorials","tutorials,assignments,lectures", "assignments,presentations,tutorials")
Grades = c(1.1,1.2,1.3)
Data = data.frame(Student_ID,Responses,Grades);Data

Student_ID | Responses                           | Grades
1          | lectures,tutorials                  | 1.1
2          | tutorials,assignments,lectures      | 1.2
3          | assignments,presentations,tutorials | 1.3

现在我想创建一个看起来像这样的数据框

Student_ID | Lectures | Tutorials | Assignments | Presentation | Grades
1          |     1    |     1     |      0      |       0      |  1.3
2          |     1    |     1     |      1      |       0      |  1.4
3          |     0    |     1     |      1      |       1      |  1.3

我设法使用 splitstackshape 包将逗号分隔的响应分隔成列。所以目前我的数据是这样的:

Student ID | Response 1 | Response 2  | Response 3 | Response 4 | Grades
1          | lectures   | tutorials   |    NA      |     NA     |   1.1
2          | tutorials  | assignments | lectures   |     NA     |   1.2
3          | assignments| presentation| tutorials  |     NA     |   1.3

但正如我之前所说,我希望我的 table 看起来像我上面介绍的那样,采用虚拟代码。我坚持如何进行。也许一个想法是通过列中的每个观察并将 1 或 0 附加到一个新的数据框,其中包含讲座、教程、作业、演示文稿作为 headers?

首先,Response 列从因子转换为字符 class。该列的每个元素然后以逗号分隔。我不知道所有可能的反应是什么,所以我使用了所有存在的反应。接下来拆分 Response 列被制成表格,指定可能的级别。结果列表在混合到旧的 data.frame.

之前被转换为矩阵
Data$Responses <- as.character(Data$Responses)
resp.split <- strsplit(Data$Responses, ",")

lev <- unique(unlist(resp.split))

resp.dummy <- lapply(resp.split, function(x) table(factor(x, levels=lev)))

Data2 <- with(Data, data.frame(Student_ID, do.call(rbind, resp.dummy), Grades))
Data2
#   Student_ID lectures tutorials assignments presentations Grades
# 1          1        1         1           0             0    1.1
# 2          2        1         1           1             0    1.2
# 3          3        0         1           1             1    1.3

我找到了对我的问题的答复。我最初做了

library(splitstackshape)
Responses = cSplit(Data, "Responses",",")

然后我添加了以下行:

library(qdapTools)
TA <- mtabulate(as.data.frame(t(TA)))

它对我有用。