如何制作比较多列的二进制矩阵

How do I make a binary matrix comparing several columns

我是编程和 R 的新手。

我在列中有这样的数据:

C1        C2     C3        C4          C5
Apple            Apple     Banana      Banana
Banana           Orange    Orange
Orange

我想制作一个二进制矩阵,将所有列与 C1 进行比较,其中 1 为真,0 为假。我想要这样的东西:

 C1        C2     C3        C4          C5
Apple      0      1         0           0
Banana     0      0         1           1
Orange     0      1         1           0

有人知道怎么做吗? 谢谢你。

您可以遍历 C2-C4 并将元素与 C1 匹配,即

(!is.na(sapply(dd[-1], function(i)match(dd$C1, i))))*1

#     C2 C3 C4 C5
#[1,]  0  1  0  0
#[2,]  0  0  1  1
#[3,]  0  1  1  0

或者用C1绑定在一起,即

cbind.data.frame(C1 = dd$C1, (!is.na(sapply(dd[-1], function(i) match(dd$C1, i)))) * 1)

#      C1 C2 C3 C4 C5
#1  Apple  0  1  0  0
#2 Banana  0  0  1  1
#3 Orange  0  1  1  0

我们可以使用 %in% :

df[-1] <- +(sapply(df[-1], `%in%`, x = df$C1))
df

#      C1 C2 C3 C4 C5
#1  Apple  0  1  0  0
#2 Banana  0  0  1  1
#3 Orange  0  1  1  0

数据

df <- structure(list(C1 = structure(1:3, .Label = c("Apple", "Banana", 
"Orange"), class = "factor"), C2 = c(NA, NA, NA), C3 = structure(c(1L, 
2L, NA), .Label = c("Apple", "Orange"), class = "factor"), C4 = structure(c(1L, 
2L, NA), .Label = c("Banana", "Orange"), class = "factor"), C5 = structure(c(1L, 
NA, NA), .Label = "Banana", class = "factor")), class = "data.frame",
row.names = c(NA, -3L))