如何使用稀疏矩阵（在 scipy.sparse 中）构建结构化数据类型？

Question

我有可以用 numpy 表示的结构化数据，如下所示：

dtype = np.dtype([('a', 'f8'),
                  ('b', 'f8')])
X = np.zeros((3,4), dtype=dtype)

并且想在它的稀疏版本上进行操作。 Scipy 有 sparse，但我还没有弄清楚如何提取 结构化 数据：

import numpy as np
import scipy.sparse as sparse

dtype = np.dtype([('a', 'f8'),
                  ('b', 'f8')])
X = np.zeros((3,4), dtype=dtype)
A, B = X['a'], X['b']
A[:] = np.arange(0, 12).reshape((3,4))

Xdok = sparse.dok_matrix(X, dtype=dtype)
Xcoo = Xdoc.tocoo()

# No supported conversion for structured type
# Xcsr = Xdok.tocsr()
# Xlil = Xdok.tolil()

# Cannot perform reduce with flexible type
# Xdok['a']

# 'coo_matrix' object is not subscriptable
# Xcoo['a']

我可以获得 doc 和 coo 版本，但是我无法切出我的密钥（例如 Xdok['a']），据我了解，dok 和 coo 在执行任何类型的数学运算时效率低下。

最终，我试图用边上的多值权重（例如 a 和 b）来表示有向图，并且我需要能够执行简单的线性代数图表。

我考虑过将 a 稀疏矩阵与 b 稀疏矩阵分开，但最终它们将以完全相同的索引填充，我宁愿保留所有数据在内存中的一个结构中。

我应该使用 Scipy 之外的其他库吗？

Answer 1

In [26]: M = sparse.coo_matrix(X)
In [27]: M.data
Out[27]: 
array([( 1., 0.), ( 2., 0.), ( 3., 0.), ( 4., 0.), ( 5., 0.), ( 6., 0.),
       ( 7., 0.), ( 8., 0.), ( 9., 0.), (10., 0.), (11., 0.)],
      dtype=[('a', '<f8'), ('b', '<f8')])
In [28]: M.A
....
ValueError: unsupported data types in input

In [30]: M.tocsr()
...
TypeError: no supported conversion for types: (dtype([('a', '<f8'), ('b', '<f8')]),)

与 dok 的转换（和自）dok 可以使用复合数据类型：

In [31]: M.todok()
Out[31]: 
<3x4 sparse matrix of type '<class 'numpy.void'>'
    with 11 stored elements in Dictionary Of Keys format>
In [32]: _.items()
Out[32]: dict_items([((0, 1), (1., 0.)), ((1, 2), (6., 0.)), ((1, 3), (7., 0.)), ((2, 3), (11., 0.)), ((2, 0), (8., 0.)), ((1, 0), (4., 0.)), ((0, 3), (3., 0.)), ((2, 2), (10., 0.)), ((1, 1), (5., 0.)), ((2, 1), (9., 0.)), ((0, 2), (2., 0.))])

dok 实现索引：

In [33]: __[0,1]
Out[33]: (1., 0.)

M.data数组是结构化的，可以通过字段名访问。但是 coo 还没有实现任何索引：

In [34]: M.data['a']
Out[34]: array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

A dok 是字典的子类，显然元素存储为 dtype 记录：

In [39]: type(M.todok()[0,1])
Out[39]: numpy.void
In [40]: M.todok()[0,1]['a']
Out[40]: 1.0

但同样，dok 索引中没有提供访问字段。

总而言之，稀疏模块在编写时并未考虑复合数据类型。它的根源是线性代数（例如求解大型稀疏线性方程）。在这些 dtype 工作的地方，它只是使用 numpy 数组和元素而无需特殊处理。

如何使用稀疏矩阵（在 scipy.sparse 中）构建结构化数据类型？

How can I have structured dtypes using sparse matrices (in scipy.sparse)?

python

numpy

matrix

scipy

sparse-matrix