如何将列中的变量放入多个列中
How to take variables in a column and make them into numerous columns
所以我有这个数据集,我一直在为其他人清理这个数据集,但他们希望根据观察类型将特定列分成几列。例如,这是一列诊断,她希望扩展此列,使其成为一个诊断的一列,另一个诊断不同的列。因此,我的抑郁症、多动症、哮喘、癌症等专栏将扩展到一个名为抑郁症的专栏,一个名为多动症等的专栏。
我很确定这违反了整理数据的原则,但我为之做这件事的人坚持认为这就是他们想要的方式。所以我尝试查看 tidyr 和 dplyr 包,但到目前为止我运气不好,可以使用一些建议。
提前感谢您的帮助
Order Diagnosis
1 1 Synaesthesia
2 1 Synaesthesia
3 1 Synaesthesia
4 1 Synaesthesia
5 1 Synaesthesia
6 1 Synaesthesia
7 1 ADHD
8 1 ADHD
9 1 ADHD
10 1 ADHD
11 1 ADHD
12 1 ADHD
13 1 ADHD
14 1 ADHD
15 1 ADHD
16 1 ADHD
17 1 ADHD
18 1 ADHD
19 1 ADHD
20 1 ADHD
21 1 ADHD
22 1 ADHD
23 1 ADHD
24 1 ADHD
25 1 ADHD
26 1 ADHD
27 1 ADHD
28 1 ADHD
29 1 ADHD
30 1 ADHD
31 1 ADHD
32 1 ADHD
33 1 ADHD
34 1 ADHD
35 1 ADHD
36 1 ADHD
37 1 ADHD
尚不完全清楚您的预期结果是什么,但一种解释是您希望重新编码数据,例如通过使用虚拟编码。
一个简单的方法是使用 model.matrix()
。试试这个:
model.matrix(~ Diagnosis - 1, dat)
DiagnosisADHD DiagnosisSynaesthesia
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 0
8 1 0
9 1 0
10 1 0
...
您可以拆分您的 "vector"(或您的情况下的列),用 NA 填充它并将其绑定到一个完全承诺的 data.frame 或矩阵中。
x <- sample(LETTERS[1:5], size = 100, replace = TRUE)
sx <- split(x, x)
ml <- max(unlist(lapply(sx, length)))
# pad the data with NAs
do.call("cbind", lapply(sx, FUN = function(m) c(m, rep(NA, ml - length(m)))))
A B C D E
[1,] "A" "B" "C" "D" "E"
[2,] "A" "B" "C" "D" "E"
[3,] "A" "B" "C" "D" "E"
[4,] "A" "B" "C" "D" "E"
[5,] "A" "B" "C" "D" "E"
[6,] "A" "B" "C" "D" "E"
[7,] "A" "B" "C" "D" "E"
[8,] "A" "B" "C" "D" "E"
[9,] "A" "B" "C" "D" "E"
[10,] "A" "B" "C" "D" "E"
[11,] "A" "B" "C" "D" "E"
[12,] "A" "B" "C" "D" "E"
[13,] "A" "B" "C" "D" "E"
[14,] "A" "B" "C" "D" "E"
[15,] NA "B" "C" "D" "E"
[16,] NA "B" "C" "D" "E"
[17,] NA "B" "C" "D" "E"
[18,] NA "B" "C" "D" "E"
[19,] NA "B" "C" "D" "E"
[20,] NA "B" "C" "D" "E"
[21,] NA "B" "C" "D" NA
[22,] NA NA "C" "D" NA
[23,] NA NA NA "D" NA
所以我有这个数据集,我一直在为其他人清理这个数据集,但他们希望根据观察类型将特定列分成几列。例如,这是一列诊断,她希望扩展此列,使其成为一个诊断的一列,另一个诊断不同的列。因此,我的抑郁症、多动症、哮喘、癌症等专栏将扩展到一个名为抑郁症的专栏,一个名为多动症等的专栏。
我很确定这违反了整理数据的原则,但我为之做这件事的人坚持认为这就是他们想要的方式。所以我尝试查看 tidyr 和 dplyr 包,但到目前为止我运气不好,可以使用一些建议。
提前感谢您的帮助
Order Diagnosis
1 1 Synaesthesia
2 1 Synaesthesia
3 1 Synaesthesia
4 1 Synaesthesia
5 1 Synaesthesia
6 1 Synaesthesia
7 1 ADHD
8 1 ADHD
9 1 ADHD
10 1 ADHD
11 1 ADHD
12 1 ADHD
13 1 ADHD
14 1 ADHD
15 1 ADHD
16 1 ADHD
17 1 ADHD
18 1 ADHD
19 1 ADHD
20 1 ADHD
21 1 ADHD
22 1 ADHD
23 1 ADHD
24 1 ADHD
25 1 ADHD
26 1 ADHD
27 1 ADHD
28 1 ADHD
29 1 ADHD
30 1 ADHD
31 1 ADHD
32 1 ADHD
33 1 ADHD
34 1 ADHD
35 1 ADHD
36 1 ADHD
37 1 ADHD
尚不完全清楚您的预期结果是什么,但一种解释是您希望重新编码数据,例如通过使用虚拟编码。
一个简单的方法是使用 model.matrix()
。试试这个:
model.matrix(~ Diagnosis - 1, dat)
DiagnosisADHD DiagnosisSynaesthesia
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 0
8 1 0
9 1 0
10 1 0
...
您可以拆分您的 "vector"(或您的情况下的列),用 NA 填充它并将其绑定到一个完全承诺的 data.frame 或矩阵中。
x <- sample(LETTERS[1:5], size = 100, replace = TRUE)
sx <- split(x, x)
ml <- max(unlist(lapply(sx, length)))
# pad the data with NAs
do.call("cbind", lapply(sx, FUN = function(m) c(m, rep(NA, ml - length(m)))))
A B C D E
[1,] "A" "B" "C" "D" "E"
[2,] "A" "B" "C" "D" "E"
[3,] "A" "B" "C" "D" "E"
[4,] "A" "B" "C" "D" "E"
[5,] "A" "B" "C" "D" "E"
[6,] "A" "B" "C" "D" "E"
[7,] "A" "B" "C" "D" "E"
[8,] "A" "B" "C" "D" "E"
[9,] "A" "B" "C" "D" "E"
[10,] "A" "B" "C" "D" "E"
[11,] "A" "B" "C" "D" "E"
[12,] "A" "B" "C" "D" "E"
[13,] "A" "B" "C" "D" "E"
[14,] "A" "B" "C" "D" "E"
[15,] NA "B" "C" "D" "E"
[16,] NA "B" "C" "D" "E"
[17,] NA "B" "C" "D" "E"
[18,] NA "B" "C" "D" "E"
[19,] NA "B" "C" "D" "E"
[20,] NA "B" "C" "D" "E"
[21,] NA "B" "C" "D" NA
[22,] NA NA "C" "D" NA
[23,] NA NA NA "D" NA