scipy 超稀疏矩阵乘法超慢

scipy super sparse matrix multiplication is super slow

SO 上有一些讨论稀疏矩阵乘法性能的帖子,但它们似乎没有在这里回答我的问题。

这里是基准代码,

# First, construct a space matrix
In [1]: from scipy.sparse import dok_matrix
In [2]: M = dok_matrix((100, (1<<32)-1), dtype=np.float32)
In [3]: rows, cols = M.shape
In [5]: js = np.random.randint(0, (1<<32)-1, size=100)
In [6]: for i in range(rows):
   ...:     for j in js:
   ...:         M[i,j] = 1.0
   ...:

# Check out sparsity
In [7]: M.shape
Out[7]: (100, 4294967295)
In [8]: M.count_nonzero()
Out[8]: 10000

# Test csr dot performance, 36.3 seconds
In [9]: csr = M.tocsr()
In [10]: %time csr.dot(csr.T)
CPU times: user 36.3 s, sys: 1min 1s, total: 1min 37s
Wall time: 1min 46s
Out[10]:
<100x100 sparse matrix of type '<class 'numpy.float32'>'
    with 10000 stored elements in Compressed Sparse Row format>

上面的csr.dot花费了36.3s,恕我直言,这是相当长的。 为了加快速度,我编写了一个简单的 for 循环点函数,如下所示,

def lil_matmul_transposeB(A, B):
    rows_a, cols_a = A.shape
    rows_b, cols_b = B.shape
    assert cols_a == cols_b
    C = np.zeros((rows_a, rows_b))

    for ra in range(rows_a):
        cols_a = A.rows[ra]
        data_a = A.data[ra]
        for i, ca in enumerate(cols_a):
            xa = data_a[i]
            for rb in range(rows_b):
                cols_b = B.rows[rb]
                data_b = B.data[rb]
                pos = bs(cols_b, ca)
                if pos!=-1:
                    C[ra,rb] += data_b[pos] * xa
    return C

# Test dot performance in LiL format,
In [25]: lil = M.tolil()
In [26]: %time A = F.lil_matmul_transposeB(lil, lil)
CPU times: user 1.26 s, sys: 2.07 ms, total: 1.26 s
Wall time: 1.26 s

上述功能仅耗时1.26s,比内置csr.dot.

快多了

所以我想知道我在这里做稀疏矩阵乘法是否有错误?

尽管稀疏度很小,但非常大的二维会带来问题。

In [12]: Mr = M.tocsr()
In [20]: Mr
Out[20]: 
<100x4294967295 sparse matrix of type '<class 'numpy.float32'>'
    with 10000 stored elements in Compressed Sparse Row format>

Transpose 只是将 csr 变成 csc,而不改变数组。两者的 indptr 只是 (101,).

In [21]: Mr.T
Out[21]: 
<4294967295x100 sparse matrix of type '<class 'numpy.float32'>'
    with 10000 stored elements in Compressed Sparse Column format>

但是当我执行 Mr@Mr.T 时,当它试图将 Mr.T 转换为 `csr.即乘法需要相同的格式:

In [22]: Mr.T.tocsr()
Traceback (most recent call last):
  File "<ipython-input-22-a376906f557e>", line 1, in <module>
    Mr.T.tocsr()
  File "/usr/local/lib/python3.8/dist-packages/scipy/sparse/csc.py", line 138, in tocsr
    indptr = np.empty(M + 1, dtype=idx_dtype)
MemoryError: Unable to allocate 32.0 GiB for an array with shape (4294967296,) and data type int64

它正在尝试创建一个 indptr 长为 (4294967296,) 的矩阵。在我有限的 RAM 机器上产生错误。在你的身上,它一定是遇到了某种内存 management/swap 任务,这会减慢它的速度。

所以即使 nnz 很小,也是极端维度导致速度变慢。