从具有列索引的元组列表创建一个稀疏矩阵,其中 是 1

Create an sparse matrix from a list of tuples having the indexes of the column where is a 1

问题:

我有一个元组列表,每个元组代表二维数组的一列,元组的每个元素代表数组中为 1 的那一列的索引;不在该元组中的其他条目为 0。

我想用这个元组列表高效地创建一个稀疏矩阵(尽量不使用 for 循环)。

示例

# init values
list_tuples = [
 (0, 2, 4),
 (0, 2, 3),
 (1, 3, 4)
]

n = length(list_tuples) + 1
m = 5 # arbritrary, however n >= max([ei for ei in list_tuples]) + 1

# what I need is a function which accepts this tuples and give the shape of the array
# (at least the row size, because the column size can be infered from the list of tuples)
A = some_function(list_tuples, array_shape = (m, n))

那么我希望得到的是以下形式的数组:

[   
 [1, 1, 0]
 [0, 0, 1]  
 [1, 1, 0]
 [0, 1, 1]
 [1, 0, 1]
]

您的值是 compressed sparse column format 所需的 indices。您还需要 indptr 数组,对于您的数据来说,它是元组长度的累加和(前缀为 0)。 data 数组将是一个长度与元组长度之和相同的数组,您可以从累积和的最后一个元素中获得。以下是您的示例的外观:

In [45]: from scipy.sparse import csc_matrix

In [46]: list_tuples = [
    ...:  (0, 2, 4),
    ...:  (0, 2, 3),
    ...:  (1, 3, 4)
    ...: ]

In [47]: indices = sum(list_tuples, ())  # Flatten the tuples into one sequence.

In [48]: indptr = np.cumsum([0] + [len(t) for t in list_tuples])

In [49]: a = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr))

In [50]: a
Out[50]: 
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Column format>

In [51]: a.A
Out[51]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1]])

请注意,csc_matrix 从索引中找到的最大值推断出行数。您可以使用 shape 参数来覆盖它,例如

In [52]: b = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr), shape=(7, len(list_tuples)))

In [53]: b
Out[53]: 
<7x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Column format>

In [54]: b.A
Out[54]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1],
       [0, 0, 0],
       [0, 0, 0]])

您还可以很容易地生成 coo_matrix。 flattened list_tuples 给出行索引,np.repeat 可用于创建列索引:

In [63]: from scipy.sparse import coo_matrix

In [64]: i = sum(list_tuples, ())  # row indices

In [65]: j = np.repeat(range(len(list_tuples)), [len(t) for t in list_tuples])

In [66]: c = coo_matrix((np.ones(len(i), dtype=int), (i, j)))

In [67]: c
Out[67]: 
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in COOrdinate format>

In [68]: c.A
Out[68]: 
array([[1, 1, 0],
       [0, 0, 1],
       [1, 1, 0],
       [0, 1, 1],
       [1, 0, 1]])