如何制作比较多列的二进制矩阵
How do I make a binary matrix comparing several columns
我是编程和 R 的新手。
我在列中有这样的数据:
C1 C2 C3 C4 C5
Apple Apple Banana Banana
Banana Orange Orange
Orange
我想制作一个二进制矩阵,将所有列与 C1 进行比较,其中 1 为真,0 为假。我想要这样的东西:
C1 C2 C3 C4 C5
Apple 0 1 0 0
Banana 0 0 1 1
Orange 0 1 1 0
有人知道怎么做吗?
谢谢你。
您可以遍历 C2-C4 并将元素与 C1 匹配,即
(!is.na(sapply(dd[-1], function(i)match(dd$C1, i))))*1
# C2 C3 C4 C5
#[1,] 0 1 0 0
#[2,] 0 0 1 1
#[3,] 0 1 1 0
或者用C1
绑定在一起,即
cbind.data.frame(C1 = dd$C1, (!is.na(sapply(dd[-1], function(i) match(dd$C1, i)))) * 1)
# C1 C2 C3 C4 C5
#1 Apple 0 1 0 0
#2 Banana 0 0 1 1
#3 Orange 0 1 1 0
我们可以使用 %in%
:
df[-1] <- +(sapply(df[-1], `%in%`, x = df$C1))
df
# C1 C2 C3 C4 C5
#1 Apple 0 1 0 0
#2 Banana 0 0 1 1
#3 Orange 0 1 1 0
数据
df <- structure(list(C1 = structure(1:3, .Label = c("Apple", "Banana",
"Orange"), class = "factor"), C2 = c(NA, NA, NA), C3 = structure(c(1L,
2L, NA), .Label = c("Apple", "Orange"), class = "factor"), C4 = structure(c(1L,
2L, NA), .Label = c("Banana", "Orange"), class = "factor"), C5 = structure(c(1L,
NA, NA), .Label = "Banana", class = "factor")), class = "data.frame",
row.names = c(NA, -3L))
我是编程和 R 的新手。
我在列中有这样的数据:
C1 C2 C3 C4 C5
Apple Apple Banana Banana
Banana Orange Orange
Orange
我想制作一个二进制矩阵,将所有列与 C1 进行比较,其中 1 为真,0 为假。我想要这样的东西:
C1 C2 C3 C4 C5
Apple 0 1 0 0
Banana 0 0 1 1
Orange 0 1 1 0
有人知道怎么做吗? 谢谢你。
您可以遍历 C2-C4 并将元素与 C1 匹配,即
(!is.na(sapply(dd[-1], function(i)match(dd$C1, i))))*1
# C2 C3 C4 C5
#[1,] 0 1 0 0
#[2,] 0 0 1 1
#[3,] 0 1 1 0
或者用C1
绑定在一起,即
cbind.data.frame(C1 = dd$C1, (!is.na(sapply(dd[-1], function(i) match(dd$C1, i)))) * 1)
# C1 C2 C3 C4 C5
#1 Apple 0 1 0 0
#2 Banana 0 0 1 1
#3 Orange 0 1 1 0
我们可以使用 %in%
:
df[-1] <- +(sapply(df[-1], `%in%`, x = df$C1))
df
# C1 C2 C3 C4 C5
#1 Apple 0 1 0 0
#2 Banana 0 0 1 1
#3 Orange 0 1 1 0
数据
df <- structure(list(C1 = structure(1:3, .Label = c("Apple", "Banana",
"Orange"), class = "factor"), C2 = c(NA, NA, NA), C3 = structure(c(1L,
2L, NA), .Label = c("Apple", "Orange"), class = "factor"), C4 = structure(c(1L,
2L, NA), .Label = c("Banana", "Orange"), class = "factor"), C5 = structure(c(1L,
NA, NA), .Label = "Banana", class = "factor")), class = "data.frame",
row.names = c(NA, -3L))