SciPy - 点积在稀疏和密集矩阵上的泛化
SciPy - Generalization of dot product over sparse and dense matrix
假设正常点积:
M3[i,k] = sum_j(M1[i,j] * M2[j,k])
现在我想用 sum 其他操作替换 sum,比如最大值:
M3[i,k] = max_j(M1[i,j] * M2[j,k])
这个问题与
平行
现在才考虑解决办法
M3 = np.sum(M1[:,:,None]*M2[None,:,:], axis=1)
或
M3 = np.max(M1[:,:,None]*M2[None,:,:], axis=1)
应该指的是稠密矩阵M1
和稀疏矩阵M2
。不幸的是,3d 稀疏矩阵在 SciPy.
中不可用
基本上,这意味着在
M3[i,k] = max_j(M1[i,j] * M2[j,k])
我们只迭代 j
这样 M2[j,k]!=0
.
解决这个问题最有效的方法是什么?
这是一种使用一个循环的方法,该循环通过共同的归约轴进行迭代 -
from scipy.sparse import csr_matrix
import scipy as sp
def reduce_after_multiply(M1, M2):
# M1 : Nump array
# M2 : Sparse matrix
# Output : NumPy array
# Get nonzero indices. Get start and stop indices representing
# intervaled indices along the axis of reduction containing
# the nonzero indices.
r,c = sp.sparse.find(M2.T)[:2]
IDs, start = np.unique(r,return_index=1)
stop = np.append(start[1:], c.size)
# Initialize output array and start loop for assigning values
m, n = M1.shape[0], M2.shape[1]
out = np.zeros((m,n))
for iterID,i in enumerate(IDs):
# Non zero indices for each col from M2. Use these to select
# M1's cols and M2's rows. Perform elementwise multiplication.
idx = c[start[iterID]:stop[iterID]]
mult = M1[:,idx]*M2.getcol(i).data
# Use the inteneded ufunc along the second axis.
out[:,i] = np.max(mult, axis=1) # Use any axis supported ufunc here
return out
用于验证的样本运行 -
In [248]: # Input data
...: M1 = np.random.rand(5,3)
...: M2 = csr_matrix(np.random.randint(0,3,(3,1000)))
...:
...: # For variety, let's make one column as all zero.
...: # This should result in corresponding col as all zeros as well.
...: M2[:,1] = 0
...:
In [249]: # Verify
...: out1 = np.max(M1[:,:,None]*M2.toarray()[None,:,:], axis=1)
In [250]: np.allclose(out1, reduce_after_multiply(M1, M2))
Out[250]: True
特别是对于点积,我们有一个内置的点方法,因此可以直接使用它。因此,我们可以将第一个密集数组的输入转换为稀疏矩阵,然后使用稀疏矩阵的 .dot
method,就像这样 -
csr_matrix(M1).dot(M2)
我们也来验证一下-
In [252]: # Verify
...: out1 = np.sum(M1[:,:,None]*M2.toarray()[None,:,:], axis=1)
In [253]: out2 = csr_matrix(M1).dot(M2)
In [254]: np.allclose(out1, out2.toarray())
Out[254]: True
您还可以查看 sparse
库,它通过提供更好的类似 numpy 的界面和 n 维数组扩展了 scipy.sparse
:https://github.com/pydata/sparse
假设正常点积:
M3[i,k] = sum_j(M1[i,j] * M2[j,k])
现在我想用 sum 其他操作替换 sum,比如最大值:
M3[i,k] = max_j(M1[i,j] * M2[j,k])
这个问题与
现在才考虑解决办法
M3 = np.sum(M1[:,:,None]*M2[None,:,:], axis=1)
或
M3 = np.max(M1[:,:,None]*M2[None,:,:], axis=1)
应该指的是稠密矩阵M1
和稀疏矩阵M2
。不幸的是,3d 稀疏矩阵在 SciPy.
基本上,这意味着在
M3[i,k] = max_j(M1[i,j] * M2[j,k])
我们只迭代 j
这样 M2[j,k]!=0
.
解决这个问题最有效的方法是什么?
这是一种使用一个循环的方法,该循环通过共同的归约轴进行迭代 -
from scipy.sparse import csr_matrix
import scipy as sp
def reduce_after_multiply(M1, M2):
# M1 : Nump array
# M2 : Sparse matrix
# Output : NumPy array
# Get nonzero indices. Get start and stop indices representing
# intervaled indices along the axis of reduction containing
# the nonzero indices.
r,c = sp.sparse.find(M2.T)[:2]
IDs, start = np.unique(r,return_index=1)
stop = np.append(start[1:], c.size)
# Initialize output array and start loop for assigning values
m, n = M1.shape[0], M2.shape[1]
out = np.zeros((m,n))
for iterID,i in enumerate(IDs):
# Non zero indices for each col from M2. Use these to select
# M1's cols and M2's rows. Perform elementwise multiplication.
idx = c[start[iterID]:stop[iterID]]
mult = M1[:,idx]*M2.getcol(i).data
# Use the inteneded ufunc along the second axis.
out[:,i] = np.max(mult, axis=1) # Use any axis supported ufunc here
return out
用于验证的样本运行 -
In [248]: # Input data
...: M1 = np.random.rand(5,3)
...: M2 = csr_matrix(np.random.randint(0,3,(3,1000)))
...:
...: # For variety, let's make one column as all zero.
...: # This should result in corresponding col as all zeros as well.
...: M2[:,1] = 0
...:
In [249]: # Verify
...: out1 = np.max(M1[:,:,None]*M2.toarray()[None,:,:], axis=1)
In [250]: np.allclose(out1, reduce_after_multiply(M1, M2))
Out[250]: True
特别是对于点积,我们有一个内置的点方法,因此可以直接使用它。因此,我们可以将第一个密集数组的输入转换为稀疏矩阵,然后使用稀疏矩阵的 .dot
method,就像这样 -
csr_matrix(M1).dot(M2)
我们也来验证一下-
In [252]: # Verify
...: out1 = np.sum(M1[:,:,None]*M2.toarray()[None,:,:], axis=1)
In [253]: out2 = csr_matrix(M1).dot(M2)
In [254]: np.allclose(out1, out2.toarray())
Out[254]: True
您还可以查看 sparse
库,它通过提供更好的类似 numpy 的界面和 n 维数组扩展了 scipy.sparse
:https://github.com/pydata/sparse