Numpy 相当于 itertools.product

Question

我知道 itertools.product 可以迭代包含多个维度的关键字列表。例如，如果我有这个：

categories = [
    [ 'A', 'B', 'C', 'D'],
    [ 'E', 'F', 'G', 'H'],
    [ 'I', 'J', 'K', 'L']
]

我在上面使用 itertools.product()，我有类似的东西：

>>> [ x for x in itertools.product(*categories) ]
('A', 'E', 'I'),
('A', 'E', 'J'),
('A', 'E', 'K'),
('A', 'E', 'L'),
('A', 'F', 'I'),
('A', 'F', 'J'),
# and so on...

对于 numpy 的数组，是否有等效、直接的方法来做同样的事情？

Answer 1

这个问题已经被问过几次了：

Using numpy to build an array of all combinations of two arrays

itertools product speed up

第一个 link 有一个有效的 numpy 解决方案，据称比 itertools 快几倍，但没有提供基准。此代码由名为 pv 的用户编写。请关注link，如果觉得有用请支持他的回答：

import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

然而，在同一个 post Alex Martelli - 他是 SO 的一位伟大 Python 大师 - 写道，itertools 是完成此任务的最快方法。所以这是一个快速基准，它证明了亚历克斯的话。

import numpy as np
import time
import itertools


def cartesian(arrays, out=None):
    ...


def test_numpy(arrays):
    for res in cartesian(arrays):
        pass


def test_itertools(arrays):
    for res in itertools.product(*arrays):
        pass


def main():
    arrays = [np.fromiter(range(100), dtype=int), np.fromiter(range(100, 200), dtype=int)]
    start = time.clock()
    for _ in range(100):
        test_numpy(arrays)
    print(time.clock() - start)
    start = time.clock()
    for _ in range(100):
        test_itertools(arrays)
    print(time.clock() - start)

if __name__ == '__main__':
    main()

输出：

0.421036
0.06742

因此，您绝对应该使用 itertools。

Numpy 相当于 itertools.product

Numpy equivalent of itertools.product

python

numpy

itertools