scipy 超稀疏矩阵乘法超慢
scipy super sparse matrix multiplication is super slow
SO 上有一些讨论稀疏矩阵乘法性能的帖子,但它们似乎没有在这里回答我的问题。
这里是基准代码,
# First, construct a space matrix
In [1]: from scipy.sparse import dok_matrix
In [2]: M = dok_matrix((100, (1<<32)-1), dtype=np.float32)
In [3]: rows, cols = M.shape
In [5]: js = np.random.randint(0, (1<<32)-1, size=100)
In [6]: for i in range(rows):
...: for j in js:
...: M[i,j] = 1.0
...:
# Check out sparsity
In [7]: M.shape
Out[7]: (100, 4294967295)
In [8]: M.count_nonzero()
Out[8]: 10000
# Test csr dot performance, 36.3 seconds
In [9]: csr = M.tocsr()
In [10]: %time csr.dot(csr.T)
CPU times: user 36.3 s, sys: 1min 1s, total: 1min 37s
Wall time: 1min 46s
Out[10]:
<100x100 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Row format>
上面的csr.dot
花费了36.3s,恕我直言,这是相当长的。
为了加快速度,我编写了一个简单的 for 循环点函数,如下所示,
def lil_matmul_transposeB(A, B):
rows_a, cols_a = A.shape
rows_b, cols_b = B.shape
assert cols_a == cols_b
C = np.zeros((rows_a, rows_b))
for ra in range(rows_a):
cols_a = A.rows[ra]
data_a = A.data[ra]
for i, ca in enumerate(cols_a):
xa = data_a[i]
for rb in range(rows_b):
cols_b = B.rows[rb]
data_b = B.data[rb]
pos = bs(cols_b, ca)
if pos!=-1:
C[ra,rb] += data_b[pos] * xa
return C
# Test dot performance in LiL format,
In [25]: lil = M.tolil()
In [26]: %time A = F.lil_matmul_transposeB(lil, lil)
CPU times: user 1.26 s, sys: 2.07 ms, total: 1.26 s
Wall time: 1.26 s
上述功能仅耗时1.26s,比内置csr.dot
.
快多了
所以我想知道我在这里做稀疏矩阵乘法是否有错误?
尽管稀疏度很小,但非常大的二维会带来问题。
In [12]: Mr = M.tocsr()
In [20]: Mr
Out[20]:
<100x4294967295 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Row format>
Transpose 只是将 csr
变成 csc
,而不改变数组。两者的 indptr
只是 (101,).
In [21]: Mr.T
Out[21]:
<4294967295x100 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Column format>
但是当我执行 Mr@Mr.T
时,当它试图将 Mr.T
转换为 `csr.即乘法需要相同的格式:
In [22]: Mr.T.tocsr()
Traceback (most recent call last):
File "<ipython-input-22-a376906f557e>", line 1, in <module>
Mr.T.tocsr()
File "/usr/local/lib/python3.8/dist-packages/scipy/sparse/csc.py", line 138, in tocsr
indptr = np.empty(M + 1, dtype=idx_dtype)
MemoryError: Unable to allocate 32.0 GiB for an array with shape (4294967296,) and data type int64
它正在尝试创建一个 indptr
长为 (4294967296,) 的矩阵。在我有限的 RAM 机器上产生错误。在你的身上,它一定是遇到了某种内存 management/swap 任务,这会减慢它的速度。
所以即使 nnz
很小,也是极端维度导致速度变慢。
SO 上有一些讨论稀疏矩阵乘法性能的帖子,但它们似乎没有在这里回答我的问题。
这里是基准代码,
# First, construct a space matrix
In [1]: from scipy.sparse import dok_matrix
In [2]: M = dok_matrix((100, (1<<32)-1), dtype=np.float32)
In [3]: rows, cols = M.shape
In [5]: js = np.random.randint(0, (1<<32)-1, size=100)
In [6]: for i in range(rows):
...: for j in js:
...: M[i,j] = 1.0
...:
# Check out sparsity
In [7]: M.shape
Out[7]: (100, 4294967295)
In [8]: M.count_nonzero()
Out[8]: 10000
# Test csr dot performance, 36.3 seconds
In [9]: csr = M.tocsr()
In [10]: %time csr.dot(csr.T)
CPU times: user 36.3 s, sys: 1min 1s, total: 1min 37s
Wall time: 1min 46s
Out[10]:
<100x100 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Row format>
上面的csr.dot
花费了36.3s,恕我直言,这是相当长的。
为了加快速度,我编写了一个简单的 for 循环点函数,如下所示,
def lil_matmul_transposeB(A, B):
rows_a, cols_a = A.shape
rows_b, cols_b = B.shape
assert cols_a == cols_b
C = np.zeros((rows_a, rows_b))
for ra in range(rows_a):
cols_a = A.rows[ra]
data_a = A.data[ra]
for i, ca in enumerate(cols_a):
xa = data_a[i]
for rb in range(rows_b):
cols_b = B.rows[rb]
data_b = B.data[rb]
pos = bs(cols_b, ca)
if pos!=-1:
C[ra,rb] += data_b[pos] * xa
return C
# Test dot performance in LiL format,
In [25]: lil = M.tolil()
In [26]: %time A = F.lil_matmul_transposeB(lil, lil)
CPU times: user 1.26 s, sys: 2.07 ms, total: 1.26 s
Wall time: 1.26 s
上述功能仅耗时1.26s,比内置csr.dot
.
所以我想知道我在这里做稀疏矩阵乘法是否有错误?
尽管稀疏度很小,但非常大的二维会带来问题。
In [12]: Mr = M.tocsr()
In [20]: Mr
Out[20]:
<100x4294967295 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Row format>
Transpose 只是将 csr
变成 csc
,而不改变数组。两者的 indptr
只是 (101,).
In [21]: Mr.T
Out[21]:
<4294967295x100 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Column format>
但是当我执行 Mr@Mr.T
时,当它试图将 Mr.T
转换为 `csr.即乘法需要相同的格式:
In [22]: Mr.T.tocsr()
Traceback (most recent call last):
File "<ipython-input-22-a376906f557e>", line 1, in <module>
Mr.T.tocsr()
File "/usr/local/lib/python3.8/dist-packages/scipy/sparse/csc.py", line 138, in tocsr
indptr = np.empty(M + 1, dtype=idx_dtype)
MemoryError: Unable to allocate 32.0 GiB for an array with shape (4294967296,) and data type int64
它正在尝试创建一个 indptr
长为 (4294967296,) 的矩阵。在我有限的 RAM 机器上产生错误。在你的身上,它一定是遇到了某种内存 management/swap 任务,这会减慢它的速度。
所以即使 nnz
很小,也是极端维度导致速度变慢。