Numpy，有效地将一个矩阵插入另一个矩阵？

Question

我正在尝试通过删除零元素、执行外积然后用零行扩大生成的矩阵或插入零矩阵来提高两个向量的外积的效率。（使用 scipy 对矩阵进行稀疏化实际上不起作用，因为转换成本很高，而且我一遍又一遍地这样做。）

import numpy
dim = 100
vec = np.random.rand(1, dim)
mask = np.flatnonzero(vec > 0.8)
vec_sp = vec[:, mask]
mat_sp = vec_sp.T * vec_sp # This is faster than dot product
# Enlarge matrix or insert into zero matrix

因为它是两个向量的外积，我知道原始矩阵中的零行和零列，它们是掩码变量中的索引。要看到这个，

a = np.array(((1,0,2,0))).reshape(1,-1)
a.T * a
>> array([[1, 0, 2, 0],
       [0, 0, 0, 0],
       [2, 0, 4, 0],
       [0, 0, 0, 0]])

我尝试了两种不同的解决方案：一种是使用 numpy 的 insert 方法，另一种是将方法附加到 mat_sp 变量。整个事情变成了一个 for 循环，而且非常慢。

for val in mask:
    if val < mat_sp.shape[0]:
        mat_sp = np.insert(mat_sp, val, values=0, axis=1)
        mat_sp = np.insert(mat_sp, val, values=0, axis=0)
    else:
        mat_sp = np.append(mat_sp, values=np.zeros((mat_sp.shape[0], 1)), axis=1)
        mat_sp = np.append(mat_sp, values=np.zeros((1, mat_sp.shape[1])), axis=0)

另一种方法是创建一个大小为 dim x dim 的零矩阵，然后通过两个 for 循环从掩码创建一个巨大的索引向量。然后使用索引向量将矩阵乘法插入零矩阵。但是，这也非常慢。

任何可以有效解决问题的想法或见解都会很棒，因为稀疏矩阵乘积需要非稀疏矩阵乘积的 2/3 时间。

使用@hjpaul的例子我们得到如下比较代码

import numpy as np
dims = 400

def test_non_sparse():
    vec = np.random.rand(1, dims)
    a = vec.T * vec

def test_sparse():  
    vec = np.random.rand(1, dims)
    idx = np.flatnonzero(vec>0.75)
    oprod = vec[:,idx].T * vec[:,idx]
    vec_oprod = np.zeros((dims, dims))
    vec_oprod[idx[:,None], idx] = oprod


if __name__ == '__main__':
    import timeit
    print('Non sparse:',timeit.timeit("test_non_sparse()", setup="from __main__ import test_non_sparse", number=10000))
    print('Sparse:',timeit.timeit("test_sparse()", setup="from __main__ import test_sparse", number=10000))

代码当然会根据向量的维度和零的数量进行改进。超过 300 个维度和大约 70% 的零可以适度提高速度，并且随着零元素和维度的数量而增加。如果矩阵和掩码一次又一次地相同，那么肯定有可能获得更大的加速。

（我做逻辑索引的错误是做 idx 而不是 idx[:,None]）

Answer 1

将一个矩阵插入另一个矩阵的最快方法是使用索引。

用你的外积来说明：

In [94]: a = np.array(((1,0,2,0)))
In [95]: idx = np.where(a>0)[0]
In [96]: idx
Out[96]: array([0, 2])
In [97]: a1 = a[idx]

压缩数组的外积：

In [98]: a2 = a1[:,None]*a1
In [99]: a2
Out[99]: 
array([[1, 2],
       [2, 4]])

设置结果数组，并使用块索引来确定 a2 值的去向：

In [100]: res = np.zeros((4,4),int)
In [101]: res[idx[:,None], idx] = a2
In [102]: res
Out[102]: 
array([[1, 0, 2, 0],
       [0, 0, 0, 0],
       [2, 0, 4, 0],
       [0, 0, 0, 0]])

未压缩数组的直接外部：

In [103]: a[:,None]*a
Out[103]: 
array([[1, 0, 2, 0],
       [0, 0, 0, 0],
       [2, 0, 4, 0],
       [0, 0, 0, 0]])
In [104]: np.outer(a,a)
Out[104]: 
array([[1, 0, 2, 0],
       [0, 0, 0, 0],
       [2, 0, 4, 0],
       [0, 0, 0, 0]])

如果a是2d，(n,1)，这个外层可以写成np.dot(a.T,a)。 dot 涉及总和，在本例中为 1 维。

我认为 a 必须非常稀疏才能从这项额外的索引工作中受益。对于 scipy 稀疏矩阵，我发现 1% 左右的稀疏度具有任何速度优势，即使矩阵是预制的。

In [105]: from scipy import sparse
In [106]: A = sparse.csr_matrix(a)
In [107]: A
Out[107]: 
<1x4 sparse matrix of type '<class 'numpy.int64'>'
    with 2 stored elements in Compressed Sparse Row format>
In [108]: A.A
Out[108]: array([[1, 0, 2, 0]], dtype=int64)
In [109]: A.T*A           # sparse matrix product, dot
Out[109]: 
<4x4 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in Compressed Sparse Column format>
In [110]: _.A
Out[110]: 
array([[1, 0, 2, 0],
       [0, 0, 0, 0],
       [2, 0, 4, 0],
       [0, 0, 0, 0]], dtype=int64)

Numpy，有效地将一个矩阵插入另一个矩阵？

Numpy, inserting one matrix in another matrix efficiently?

python

numpy

sparse-matrix

matrix-multiplication

array-broadcasting