zip() 是在 numpy 中根据内存组合数组的最有效方法吗？

Question

我使用 numpy 并有两个数组，用 genfromtxt 读取。

它们的形状 <10000,> 根据 np.shape()。

我希望这两个向量位于形状为 <10000,2> 的数组中。现在我使用：

x = zip(x1,x2)

但我不确定是否有 numpy 函数可以有效地做到这一点 better/more。我认为 concatenate 没有按照我的想法行事（或者我做错了）。

Answer 1

有numpy.column_stack:

>>> a = numpy.arange(10)
>>> b = numpy.arange(1, 11)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> numpy.column_stack((a, b))
array([[ 0,  1],
       [ 1,  2],
       [ 2,  3],
       [ 3,  4],
       [ 4,  5],
       [ 5,  6],
       [ 6,  7],
       [ 7,  8],
       [ 8,  9],
       [ 9, 10]])
>>> numpy.column_stack((a, b)).shape
(10, 2)

我不保证这在内存使用等方面比zip更好，但在这一切之下，它似乎依赖 numpy.concatenate（在 C 中实现），所以这至少是令人鼓舞的：

>>> import inspect
>>> print inspect.getsource(numpy.column_stack)
def column_stack(tup):
    """
    Stack 1-D arrays as columns into a 2-D array.

    Take a sequence of 1-D arrays and stack them as columns
    to make a single 2-D array. 2-D arrays are stacked as-is,
    just like with `hstack`.  1-D arrays are turned into 2-D columns
    first.

    Parameters
    ----------
    tup : sequence of 1-D or 2-D arrays.
        Arrays to stack. All of them must have the same first dimension.

    Returns
    -------
    stacked : 2-D array
        The array formed by stacking the given arrays.

    See Also
    --------
    hstack, vstack, concatenate

    Notes
    -----
    This function is equivalent to ``np.vstack(tup).T``.

    Examples
    --------
    >>> a = np.array((1,2,3))
    >>> b = np.array((2,3,4))
    >>> np.column_stack((a,b))
    array([[1, 2],
           [2, 3],
           [3, 4]])

    """
    arrays = []
    for v in tup:
        arr = array(v, copy=False, subok=True)
        if arr.ndim < 2:
            arr = array(arr, copy=False, subok=True, ndmin=2).T
        arrays.append(arr)
    return _nx.concatenate(arrays, 1)

Answer 2

一个简单的测试：

python -m timeit "import numpy as np; x, y = np.array([range(100000), range(100000,200000)]); zip(x,y)"

10 次循环，3 次循环最佳：每次循环 32.2 毫秒

python -m timeit "import numpy as np; x, y = np.array([range(100000), range(100000,200000)]); np.column_stack((x, y))"

10 个循环，3 个循环中的最佳：每个循环 14.4 毫秒

zip() 是在 numpy 中根据内存组合数组的最有效方法吗？

Is zip() the most efficient way to combine arrays with respect to memory in numpy?

python

arrays

zip

numpy

genfromtxt