Scipy 来自边缘列表的稀疏矩阵
Scipy sparse matrix from edge list
如何将 边列表 (data) 转换为 python scipy稀疏矩阵得到这个结果:
数据集(其中'agn'是节点类别一,'fct'是节点类别二):
data['agn'].tolist()
['p1', 'p1', 'p1', 'p1', 'p1', 'p2', 'p2', 'p2', 'p2', 'p3', 'p3', 'p3', 'p4', 'p4', 'p5']
data['fct'].tolist()
['f1', 'f2', 'f3', 'f4', 'f5', 'f3', 'f4', 'f5', 'f6', 'f5', 'f6', 'f7', 'f7', 'f8', 'f9']
(不工作)python代码:
from scipy.sparse import csr_matrix, coo_matrix
csr_matrix((data_sub['agn'].values, data['fct'].values),
shape=(len(set(data['agn'].values)), len(set(data_sub['fct'].values))))
-> 错误: "TypeError: invalid input format"
我真的需要三个数组来构造矩阵吗,就像 scipy csr 文档中的示例确实建议的那样(只能使用两个链接,抱歉!)?
(working) R 代码用于构造仅包含两个向量的矩阵:
library(Matrix)
grph_tim <- sparseMatrix(i = as.numeric(data$agn),
j = as.numeric(data$fct),
dims = c(length(levels(data$agn)),
length(levels(data$fct))),
dimnames = list(levels(data$agn),
levels(data$fct)))
编辑:
在我修改 的代码并添加所需的数组后,它终于起作用了:
import numpy as np
import pandas as pd
import scipy.sparse as ss
def read_data_file_as_coo_matrix(filename='edges.txt'):
"Read data file and return sparse matrix in coordinate format."
# if the nodes are integers, use 'dtype = np.uint32'
data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')
# where 'rows' is node category one and 'cols' node category 2
rows = data['agn'] # Not a copy, just a reference.
cols = data['fct']
# crucial third array in python, which can be left out in r
ones = np.ones(len(rows), np.uint32)
matrix = ss.coo_matrix((ones, (rows, cols)))
return matrix
此外,我将节点的字符串名称转换为整数。因此 data['agn']
变成 [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
而 data['fct']
变成 [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8]
.
我得到这个稀疏矩阵:
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
(0, 4) 1
(1, 2) 1
(1, 3) 1
(1, 4) 1
(1, 5) 1
(2, 4) 1
(2, 5) 1
(2, 6) 1
(3, 6) 1
(3, 7) 1
(4, 8) 1
在我修改了 的代码并添加了所需的数组后终于成功了:
import numpy as np
import pandas as pd
import scipy.sparse as ss
def read_data_file_as_coo_matrix(filename='edges.txt'):
"Read data file and return sparse matrix in coordinate format."
# if the nodes are integers, use 'dtype = np.uint32'
data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')
# where 'rows' is node category one and 'cols' node category 2
rows = data['agn'] # Not a copy, just a reference.
cols = data['fct']
# crucial third array in python, which can be left out in r
ones = np.ones(len(rows), np.uint32)
matrix = ss.coo_matrix((ones, (rows, cols)))
return matrix
此外,我将节点的字符串名称转换为整数。因此 data['agn']
变成 [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
而 data['fct']
变成 [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8]
.
我得到这个稀疏矩阵:
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
(0, 4) 1
(1, 2) 1
(1, 3) 1
(1, 4) 1
(1, 5) 1
(2, 4) 1
(2, 5) 1
(2, 6) 1
(3, 6) 1
(3, 7) 1
(4, 8) 1
如何将 边列表 (data) 转换为 python scipy稀疏矩阵得到这个结果:
数据集(其中'agn'是节点类别一,'fct'是节点类别二):
data['agn'].tolist()
['p1', 'p1', 'p1', 'p1', 'p1', 'p2', 'p2', 'p2', 'p2', 'p3', 'p3', 'p3', 'p4', 'p4', 'p5']
data['fct'].tolist()
['f1', 'f2', 'f3', 'f4', 'f5', 'f3', 'f4', 'f5', 'f6', 'f5', 'f6', 'f7', 'f7', 'f8', 'f9']
(不工作)python代码:
from scipy.sparse import csr_matrix, coo_matrix
csr_matrix((data_sub['agn'].values, data['fct'].values),
shape=(len(set(data['agn'].values)), len(set(data_sub['fct'].values))))
-> 错误: "TypeError: invalid input format" 我真的需要三个数组来构造矩阵吗,就像 scipy csr 文档中的示例确实建议的那样(只能使用两个链接,抱歉!)?
(working) R 代码用于构造仅包含两个向量的矩阵:
library(Matrix)
grph_tim <- sparseMatrix(i = as.numeric(data$agn),
j = as.numeric(data$fct),
dims = c(length(levels(data$agn)),
length(levels(data$fct))),
dimnames = list(levels(data$agn),
levels(data$fct)))
编辑:
在我修改
import numpy as np
import pandas as pd
import scipy.sparse as ss
def read_data_file_as_coo_matrix(filename='edges.txt'):
"Read data file and return sparse matrix in coordinate format."
# if the nodes are integers, use 'dtype = np.uint32'
data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')
# where 'rows' is node category one and 'cols' node category 2
rows = data['agn'] # Not a copy, just a reference.
cols = data['fct']
# crucial third array in python, which can be left out in r
ones = np.ones(len(rows), np.uint32)
matrix = ss.coo_matrix((ones, (rows, cols)))
return matrix
此外,我将节点的字符串名称转换为整数。因此 data['agn']
变成 [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
而 data['fct']
变成 [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8]
.
我得到这个稀疏矩阵:
(0, 0) 1 (0, 1) 1 (0, 2) 1 (0, 3) 1 (0, 4) 1 (1, 2) 1 (1, 3) 1 (1, 4) 1 (1, 5) 1 (2, 4) 1 (2, 5) 1 (2, 6) 1 (3, 6) 1 (3, 7) 1 (4, 8) 1
在我修改了
import numpy as np
import pandas as pd
import scipy.sparse as ss
def read_data_file_as_coo_matrix(filename='edges.txt'):
"Read data file and return sparse matrix in coordinate format."
# if the nodes are integers, use 'dtype = np.uint32'
data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')
# where 'rows' is node category one and 'cols' node category 2
rows = data['agn'] # Not a copy, just a reference.
cols = data['fct']
# crucial third array in python, which can be left out in r
ones = np.ones(len(rows), np.uint32)
matrix = ss.coo_matrix((ones, (rows, cols)))
return matrix
此外,我将节点的字符串名称转换为整数。因此 data['agn']
变成 [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
而 data['fct']
变成 [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8]
.
我得到这个稀疏矩阵:
(0, 0) 1 (0, 1) 1 (0, 2) 1 (0, 3) 1 (0, 4) 1 (1, 2) 1 (1, 3) 1 (1, 4) 1 (1, 5) 1 (2, 4) 1 (2, 5) 1 (2, 6) 1 (3, 6) 1 (3, 7) 1 (4, 8) 1