Python、scipy coo_matrix 中的共现矩阵
Co occurrence Matrix in Python, scipy coo_matrix
我有一个文档术语矩阵,它是根据语料库中术语的共同出现构建的,正如所解释的那样 here:
vocabulary = {} # map terms to column indices
data = [] # values (maybe weights)
row = [] # row (document) indices
col = [] # column (term) indices
import scipy
for i, doc in enumerate(bloblist):
for term in doc:
# get column index, adding the term to the vocabulary if needed
j = vocabulary.setdefault(term, len(vocabulary))
data.append(1) # uniform weights
row.append(i)
col.append(j)
A = scipy.sparse.coo_matrix((data, (row, col)))
>>>print A
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
...
现在我想将它导出到csv 或写入db。我不知道该怎么做,我不知道如何处理稀疏矩阵。
当我尝试时,我总是收到此错误:
TypeError: 'coo_matrix' object has no attribute '__getitem__'
请查看 input/output section of scipy. You can use mmwrite
to write the matrix using the matrix market format,这是稀疏矩阵存储的标准格式。
创建随机稀疏矩阵并将其写为 MM 文件的示例如下:
>>> import scipy.sparse
>>> A = scipy.sparse.rand(20, 20)
>>> print A
(3, 4) 0.0579085844686
(14, 9) 0.914421740712
(15, 10) 0.622861279405
(5, 17) 0.83146022149
>>> import scipy.io
>>> scipy.io.mmwrite('output', A)
output.mtx的内容:
→ cat output.mtx
%%MatrixMarket matrix coordinate real general
%
20 20 4
4 5 0.05790858446861069
15 10 0.9144217407118101
16 11 0.6228612794046831
6 18 0.8314602214903816
scipy
有多种稀疏矩阵格式。您可以使用 to_csc()
或 to_csr()
等方法将矩阵转换为其他类型之一,这些方法允许访问其成员
我有一个文档术语矩阵,它是根据语料库中术语的共同出现构建的,正如所解释的那样 here:
vocabulary = {} # map terms to column indices
data = [] # values (maybe weights)
row = [] # row (document) indices
col = [] # column (term) indices
import scipy
for i, doc in enumerate(bloblist):
for term in doc:
# get column index, adding the term to the vocabulary if needed
j = vocabulary.setdefault(term, len(vocabulary))
data.append(1) # uniform weights
row.append(i)
col.append(j)
A = scipy.sparse.coo_matrix((data, (row, col)))
>>>print A
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
...
现在我想将它导出到csv 或写入db。我不知道该怎么做,我不知道如何处理稀疏矩阵。
当我尝试时,我总是收到此错误:
TypeError: 'coo_matrix' object has no attribute '__getitem__'
请查看 input/output section of scipy. You can use mmwrite
to write the matrix using the matrix market format,这是稀疏矩阵存储的标准格式。
创建随机稀疏矩阵并将其写为 MM 文件的示例如下:
>>> import scipy.sparse
>>> A = scipy.sparse.rand(20, 20)
>>> print A
(3, 4) 0.0579085844686
(14, 9) 0.914421740712
(15, 10) 0.622861279405
(5, 17) 0.83146022149
>>> import scipy.io
>>> scipy.io.mmwrite('output', A)
output.mtx的内容:
→ cat output.mtx
%%MatrixMarket matrix coordinate real general
%
20 20 4
4 5 0.05790858446861069
15 10 0.9144217407118101
16 11 0.6228612794046831
6 18 0.8314602214903816
scipy
有多种稀疏矩阵格式。您可以使用 to_csc()
或 to_csr()
等方法将矩阵转换为其他类型之一,这些方法允许访问其成员