展平 NumPy 数组列表？

Question

看来我有 NumPy 数组列表格式的数据 (type() = np.ndarray):

[array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]])]

我正在尝试将其放入 polyfit 函数中：

m1 = np.polyfit(x, y, deg=2)

然而，它returns错误：TypeError: expected 1D vector for x

我想我需要将我的数据展平成类似这样的东西：

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654 ...]

我尝试了一个列表理解，它通常适用于列表的列表，但正如预期的那样没有奏效：

[val for sublist in risks for val in sublist]

最好的方法是什么？

Answer 1

您可以使用 numpy.concatenate，顾名思义，它基本上将此类输入列表的所有元素连接到单个 NumPy 数组中，就像这样 -

import numpy as np
out = np.concatenate(input_list).ravel()

如果你希望最终的输出是一个列表，你可以扩展解决方案，像这样-

out = np.concatenate(input_list).ravel().tolist()

样本运行-

In [24]: input_list
Out[24]: 
[array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]])]

In [25]: np.concatenate(input_list).ravel()
Out[25]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654])

转换为列表 -

In [26]: np.concatenate(input_list).ravel().tolist()
Out[26]: 
[0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654]

Answer 2

我遇到了同样的问题，并找到了结合可变长度的一维 numpy 数组的解决方案：

np.column_stack(input_list).ravel()

有关详细信息，请参阅 numpy.column_stack。

带有示例数据的可变长度数组示例：

In [135]: input_list
Out[135]: 
[array([[ 0.00353654,  0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654,  0.00353654,  0.00353654]])]

In [136]: [i.size for i in input_list]    # variable size arrays
Out[136]: [2, 1, 1, 3]

In [137]: np.column_stack(input_list).ravel()
Out[137]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654])

注意：仅在 Python 2.7.12

上测试

Answer 3

也可以通过

完成

np.array(list_of_arrays).flatten().tolist()

导致

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654]

更新

正如@aydow 在评论中指出的那样，如果不关心 copy or a view

，使用 numpy.ndarray.ravel 会更快

np.array(list_of_arrays).ravel()

虽然，根据 docs

When a view is desired in as many cases as possible, arr.reshape(-1) may be preferable.

换句话说

np.array(list_of_arrays).reshape(-1)

我的初步建议是使用numpy.ndarray.flatten，会影响性能。

现在让我们看看上面列出的解决方案的 时间复杂度 与使用 perfplot 包进行类似于 OP[=29= 的设置的设置相比如何]

import perfplot

perfplot.show(
    setup=lambda n: np.random.rand(n, 2),
    kernels=[lambda a: a.ravel(),
             lambda a: a.flatten(),
             lambda a: a.reshape(-1)],
    labels=['ravel', 'flatten', 'reshape'],
    n_range=[2**k for k in range(16)],
    xlabel='N')

此处 flatten 演示了分段线性复杂度，可以通过复制初始数组与 ravel 和 reshape 的常量复杂度进行比较来合理解释 return一个观点。

还值得注意的是，完全可以预见的是，转换输出 .tolist() 可以使所有三个的性能均等地线性化。

Answer 4

另一种简单的方法是使用 numpy.hstack()，然后使用 squeeze() 删除单例维度，如：

In [61]: np.hstack(list_of_arrs).squeeze()
Out[61]: 
array([0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654])

Answer 5

使用 itertools 展平数组的另一种方法：

import itertools

# Recreating array from question
a = [np.array([[0.00353654]])] * 13

# Make an iterator to yield items of the flattened list and create a list from that iterator
flattened = list(itertools.chain.from_iterable(a))

此解决方案应该非常快速，请参阅了解更多说明。

如果生成的数据结构应该是 numpy 数组，请使用 numpy.fromiter() 将迭代器耗尽到数组中：

# Make an iterator to yield items of the flattened list and create a numpy array from that iterator
flattened_array = np.fromiter(itertools.chain.from_iterable(a), float)

itertools.chain.from_iterable() 的文档： https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable

numpy.fromiter() 的文档： https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html

展平 NumPy 数组列表？

Flattening a list of NumPy arrays?

python

arrays

numpy

list-comprehension