从 R 或 python 中的字符串向量创建 0 和 1 的矩阵

Question

我想从一个向量创建一个 0 和 1 的矩阵，其中每个字符串都包含我要映射到矩阵的两个名称。例如，如果我有以下向量

vector_matrix <- c("A_B", "A_C", "B_C", "B_D", "C_D")

我想把它转化成下面的矩阵

我愿意接受任何建议，但最好有一些内置函数可以处理它。我正在尝试做一件非常相似的事情，但在一定程度上我将生成一个包含 2500 万个单元格的矩阵。

我更喜欢代码是 R，但如果有一些 pythonic 解决方案也没关系:)

编辑：因此，当我说“A_B”时，我希望 A 行 B 列中有一个“1”。相反（A 列 B 行）也没有关系。

编辑：我想要一个矩阵，其中行名和列名是字母。

Answer 1

从数据中创建一个两列数据框 d，计算水平，然后生成一个列表，其中 d 的每一列都是一个因子，最后运行 table。第二行对每一行进行排序，显示的输入实际上不需要它，因此可以将其省略，但如果 B_A 被视为 A_B.[=15，则您可能需要它用于其他数据=]

d <- read.table(text = vector_matrix, sep = "_")
d[] <- t(apply(d, 1, sort))
tab <- table( lapply(d, factor, levels = levels(factor(unlist(d)))) )
tab

给这个 table:

   V2
V1  A B C D
  A 0 1 1 0
  B 0 0 1 1
  C 0 0 0 1
  D 0 0 0 0


heatmap(tab[nrow(tab):1, ], NA, NA, col = 2:3, symm = TRUE)

library(igraph)
g <- graph_from_adjacency_matrix(tab, mode = "undirected")
plot(g)

Answer 2

以下应该适用于 Python。它将输入数据拆分为两个列表，将字符转换为索引并将矩阵的索引设置为 1。

import numpy as np

vector_matrix = ("A_B", "A_C", "B_C", "B_D", "C_D")

# Split data in two lists
rows, cols = zip(*(s.split("_") for s in vector_matrix))
print(rows, cols)
>>> ('A', 'A', 'B', 'B', 'C') ('B', 'C', 'C', 'D', 'D')

# With inspiration from: 
row_idxs = np.array([ord(char) - 65 for char in rows])
col_idxs = np.array([ord(char) - 65 for char in cols])
print(row_idxs, col_idxs)
>>> [0 0 1 1 2] [1 2 2 3 3]

n_rows = row_idxs.max() + 1
n_cols = col_idxs.max() + 1
print(n_rows, n_cols)
>>> 3 4

mat = np.zeros((n_rows, n_cols), dtype=int)
mat[row_idxs, col_idxs] = 1
print(mat)
>>>
[[0 1 1 0]
 [0 0 1 1]
 [0 0 0 1]]

从 R 或 python 中的字符串向量创建 0 和 1 的矩阵

Creating matrix of 0 and 1 from a string vector in R or python

python

r

matrix

sparse-matrix

python-3.x