numba 不接受 dtype=object 的 numpy 数组

numba does not accept numpy arrays with dtype=object

我有一个空数组,我想在每个索引 [i,j] 处填充任意长度的列表。所以我初始化了一个空数组,它应该包含这样的对象:

@jit(nopython=True, parrallel=True)
def numba_function():
    values          = np.empty((length, length), dtype=object)
    for i in range(10):
        for j in range(10):
            a_list_of_things = [1,2,3,4]
            values[i,j] = a_list_of_things

这失败了:

 TypingError: Failed in nopython mode pipeline (step: nopython frontend) Untyped global name 'object': cannot determine Numba type of <class 'type'>

如果我通过设置 nopython=False 关闭 numba,代码工作正常。在 values 数组中设置 dtype=list 并没有改善事情。

有什么聪明的技巧可以克服这个问题吗?

nopython 模式下的 Numba(自版本 0.43.1 起)不支持对象数组。

键入对象数组的正确方法是:

import numba as nb
import numpy as np

@nb.njit
def numba_function():
    values = np.empty((2, 2), np.object_)
    return values

但如前所述,这不起作用:

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Internal error at resolving type of attribute "object_" of "[=11=].4":
NotImplementedError: object

这个在the numba documentation中也有提到:

2.7.1. Scalar types

Numba supports the following Numpy scalar types:

  • Integers: all integers of either signedness, and any width up to 64 bits
  • Booleans
  • Real numbers: single-precision (32-bit) and double-precision (64-bit) reals
  • Complex numbers: single-precision (2x32-bit) and double-precision (2x64-bit) complex numbers
  • Datetimes and timestamps: of any unit
  • Character sequences (but no operations are available on them)
  • Structured scalars: structured scalars made of any of the types above and arrays of the types above

The following scalar types and features are not supported:

  • Arbitrary Python objects
  • Half-precision and extended-precision real and complex numbers
  • Nested structured scalars the fields of structured scalars may not contain other structured scalars

[...]

2.7.2. Array types

Numpy arrays of any of the scalar types above are supported, regardless of the shape or layout.

(强调我的)

由于 dtype=object 允许任意 Python 对象,因此不受支持。而 dtype=list 正好等同于 dtype=object (documentation)

Built-in Python types

Several python types are equivalent to a corresponding array scalar when used to generate a dtype object:

int           np.int_
bool          np.bool_
float         np.float_
complex       np.cfloat
bytes         np.bytes_
str           np.bytes_ (Python2) or np.unicode_ (Python3)
unicode       np.unicode_
buffer        np.void
(all others)  np.object_

总而言之:拥有适用于 NumPy 数组和 numba 函数的 object 数组会非常慢。每当您选择使用此类 object 数组时,您 隐含地 决定您不想要 high-performance.

所以如果你想要性能并使用 NumPy 数组,那么你需要重写它,这样你就不会先使用对象数组,如果它仍然很慢,那么你可以考虑在 non-object数组。