zip() 是在 numpy 中根据内存组合数组的最有效方法吗?
Is zip() the most efficient way to combine arrays with respect to memory in numpy?
我使用 numpy 并有两个数组,用 genfromtxt
读取。
它们的形状 <10000,>
根据 np.shape()
。
我希望这两个向量位于形状为 <10000,2>
的数组中。现在我使用:
x = zip(x1,x2)
但我不确定是否有 numpy 函数可以有效地做到这一点 better/more。我认为 concatenate 没有按照我的想法行事(或者我做错了)。
有numpy.column_stack
:
>>> a = numpy.arange(10)
>>> b = numpy.arange(1, 11)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> numpy.column_stack((a, b))
array([[ 0, 1],
[ 1, 2],
[ 2, 3],
[ 3, 4],
[ 4, 5],
[ 5, 6],
[ 6, 7],
[ 7, 8],
[ 8, 9],
[ 9, 10]])
>>> numpy.column_stack((a, b)).shape
(10, 2)
我不保证这在内存使用等方面比zip
更好,但在这一切之下,它似乎依赖 numpy.concatenate(在 C 中实现),所以这至少是令人鼓舞的:
>>> import inspect
>>> print inspect.getsource(numpy.column_stack)
def column_stack(tup):
"""
Stack 1-D arrays as columns into a 2-D array.
Take a sequence of 1-D arrays and stack them as columns
to make a single 2-D array. 2-D arrays are stacked as-is,
just like with `hstack`. 1-D arrays are turned into 2-D columns
first.
Parameters
----------
tup : sequence of 1-D or 2-D arrays.
Arrays to stack. All of them must have the same first dimension.
Returns
-------
stacked : 2-D array
The array formed by stacking the given arrays.
See Also
--------
hstack, vstack, concatenate
Notes
-----
This function is equivalent to ``np.vstack(tup).T``.
Examples
--------
>>> a = np.array((1,2,3))
>>> b = np.array((2,3,4))
>>> np.column_stack((a,b))
array([[1, 2],
[2, 3],
[3, 4]])
"""
arrays = []
for v in tup:
arr = array(v, copy=False, subok=True)
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
arrays.append(arr)
return _nx.concatenate(arrays, 1)
一个简单的测试:
python -m timeit "import numpy as np; x, y = np.array([range(100000), range(100000,200000)]); zip(x,y)"
10 次循环,3 次循环最佳:每次循环 32.2 毫秒
python -m timeit "import numpy as np; x, y = np.array([range(100000), range(100000,200000)]); np.column_stack((x, y))"
10 个循环,3 个循环中的最佳:每个循环 14.4 毫秒
我使用 numpy 并有两个数组,用 genfromtxt
读取。
它们的形状 <10000,>
根据 np.shape()
。
我希望这两个向量位于形状为 <10000,2>
的数组中。现在我使用:
x = zip(x1,x2)
但我不确定是否有 numpy 函数可以有效地做到这一点 better/more。我认为 concatenate 没有按照我的想法行事(或者我做错了)。
有numpy.column_stack
:
>>> a = numpy.arange(10)
>>> b = numpy.arange(1, 11)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> numpy.column_stack((a, b))
array([[ 0, 1],
[ 1, 2],
[ 2, 3],
[ 3, 4],
[ 4, 5],
[ 5, 6],
[ 6, 7],
[ 7, 8],
[ 8, 9],
[ 9, 10]])
>>> numpy.column_stack((a, b)).shape
(10, 2)
我不保证这在内存使用等方面比zip
更好,但在这一切之下,它似乎依赖 numpy.concatenate(在 C 中实现),所以这至少是令人鼓舞的:
>>> import inspect
>>> print inspect.getsource(numpy.column_stack)
def column_stack(tup):
"""
Stack 1-D arrays as columns into a 2-D array.
Take a sequence of 1-D arrays and stack them as columns
to make a single 2-D array. 2-D arrays are stacked as-is,
just like with `hstack`. 1-D arrays are turned into 2-D columns
first.
Parameters
----------
tup : sequence of 1-D or 2-D arrays.
Arrays to stack. All of them must have the same first dimension.
Returns
-------
stacked : 2-D array
The array formed by stacking the given arrays.
See Also
--------
hstack, vstack, concatenate
Notes
-----
This function is equivalent to ``np.vstack(tup).T``.
Examples
--------
>>> a = np.array((1,2,3))
>>> b = np.array((2,3,4))
>>> np.column_stack((a,b))
array([[1, 2],
[2, 3],
[3, 4]])
"""
arrays = []
for v in tup:
arr = array(v, copy=False, subok=True)
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
arrays.append(arr)
return _nx.concatenate(arrays, 1)
一个简单的测试:
python -m timeit "import numpy as np; x, y = np.array([range(100000), range(100000,200000)]); zip(x,y)"
10 次循环,3 次循环最佳:每次循环 32.2 毫秒
python -m timeit "import numpy as np; x, y = np.array([range(100000), range(100000,200000)]); np.column_stack((x, y))"
10 个循环,3 个循环中的最佳:每个循环 14.4 毫秒