Scipy 稀疏矩阵切片 returns IndexError

Scipy sparse matrix slicing returns IndexError

如果我尝试对稀疏矩阵进行切片或查看给定 [row,colum] 处的值,我会得到 IndexError

更准确地说,我有以下 scipy.sparse.csr_matrix 是我在保存后从文件中加载的

...
>>> A = scipy.sparse.csr_matrix((vals, (rows, cols)), shape=(output_dim, input_dim))
>>> np.save(open('test_matrix.dat', 'wb'), A)
...
>>> A = np.load('test_matrix.dat', allow_pickle=True)
>>> A
array(<831232x798208 sparse matrix of type '<class 'numpy.float32'>'
    with 109886100 stored elements in Compressed Sparse Row format>,
      dtype=object)

但是,当我尝试获取给定 [行,列] 对的值时,出现以下错误

>>> A[1,1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array

为什么会这样?

澄清一下,我确定矩阵不为空,因为我可以看到它的内容

>>> print(A)
  (0, 1)    0.24914551
  (0, 2)    0.6669922
  (1, 1)    0.75097656
  (1, 3)    0.6640625
  (2, 3)    0.3359375
  (2, 514)  0.34960938
...

当您保存并重新加载稀疏数组时,您已经创建了一个只有一个条目的数组;一个对象,作为你的稀疏数组。所以 A 在 [1,1] 处什么都没有。您应该改用 scipy.sparse.save_npz

例如:

import scipy.sparse as sps
import numpy as np

A = sps.csr_matrix((10,10))
A
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in Compressed Sparse Row format>
np.save('test_matrix.dat', A)
B = np.load('test_matrix.dat.npy', allow_pickle=True)
B
array(<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in Compressed Sparse Row format>, dtype=object)
B[1,1]
IndexError                                Traceback (most recent call last)
<ipython-input-101-969f8bd5206a> in <module>
----> 1 B[1,1]

IndexError: too many indices for array
sps.save_npz('sparse_dat')
C = sps.load_npz('sparse_dat.npz')
C
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in Compressed Sparse Row format>
C[1,1]
0.0

请注意,您仍然可以像这样从 B 中检索 A

D = B.tolist()
D
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in Compressed Sparse Row format>
D[1,1]
0.0