有效地找到稀疏矩阵的最小列的索引
Efficiently finding the indices of a sparse matrix's smallest columns
“最小列”是指元素总和最少(即最负)的列。这是我的尝试,但效率不高,因为我构建了一个列总和的完整列表。 h
是一个 scipy.sparse
矩阵,k
是请求的索引数。结果排序并不重要。
def indices_of_smallest_columns(h,k):
size=h.get_shape()[0]
arr=[h.tocsc().getcol(i).sum() for i in range(size)]
return np.argpartition(arr,k)[:k]
In [1]: from scipy import sparse
In [2]: M = sparse.random(10,10,.2)
In [3]: M
Out[3]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in COOrdinate format>
您的总和列表:
In [5]: [M.tocsc().getcol(i).sum() for i in range(10)]
Out[5]:
[1.5659425833256746,
1.7665038140319338,
0.0,
0.6422706809316442,
0.24922121199061487,
1.439977730279475,
0.17827454933565012,
1.7955436609690185,
0.4275656628694753,
1.4029484081520989]
直接获取矩阵和:
In [6]: M.sum(axis=0)
Out[6]:
matrix([[1.56594258, 1.76650381, 0. , 0.64227068, 0.24922121,
1.43997773, 0.17827455, 1.79554366, 0.42756566, 1.40294841]])
sparse
使用矩阵乘法得到这样的总和。
时间安排:
In [7]: timeit [M.tocsc().getcol(i).sum() for i in range(10)]
2.87 ms ± 90.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: timeit M.sum(axis=0)
161 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
如果矩阵已经是csc
,时间会更好:
In [12]: %%timeit h=M.tocsc()
...: h.sum(axis=0)
...:
...:
54.5 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
“最小列”是指元素总和最少(即最负)的列。这是我的尝试,但效率不高,因为我构建了一个列总和的完整列表。 h
是一个 scipy.sparse
矩阵,k
是请求的索引数。结果排序并不重要。
def indices_of_smallest_columns(h,k):
size=h.get_shape()[0]
arr=[h.tocsc().getcol(i).sum() for i in range(size)]
return np.argpartition(arr,k)[:k]
In [1]: from scipy import sparse
In [2]: M = sparse.random(10,10,.2)
In [3]: M
Out[3]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in COOrdinate format>
您的总和列表:
In [5]: [M.tocsc().getcol(i).sum() for i in range(10)]
Out[5]:
[1.5659425833256746,
1.7665038140319338,
0.0,
0.6422706809316442,
0.24922121199061487,
1.439977730279475,
0.17827454933565012,
1.7955436609690185,
0.4275656628694753,
1.4029484081520989]
直接获取矩阵和:
In [6]: M.sum(axis=0)
Out[6]:
matrix([[1.56594258, 1.76650381, 0. , 0.64227068, 0.24922121,
1.43997773, 0.17827455, 1.79554366, 0.42756566, 1.40294841]])
sparse
使用矩阵乘法得到这样的总和。
时间安排:
In [7]: timeit [M.tocsc().getcol(i).sum() for i in range(10)]
2.87 ms ± 90.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: timeit M.sum(axis=0)
161 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
如果矩阵已经是csc
,时间会更好:
In [12]: %%timeit h=M.tocsc()
...: h.sum(axis=0)
...:
...:
54.5 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)