为什么 numba 的性能比原生 python 字符串元组差 100？

Question

出于性能原因，我想使用 numba 来提高我的代码的性能。然而，numba 函数的性能比原生 python 函数差。谁能解释一下为什么？

from numba import jit
import timeit

@jit(nopython=True, fastmath=True)
def get_exact_score_with_numba(tokens_to_match, candidate_tokens):
    score = 0.
    for token in tokens_to_match:
        if token in candidate_tokens:
            score += 1.
    return score / len(tokens_to_match)


def get_exact_score_without_numba(tokens_to_match, candidate_tokens):
    score = 0.
    for token in tokens_to_match:
        if token in candidate_tokens:
            score += 1.
    return score / len(tokens_to_match)


tokens_to_match = ('a', 'b')
candidate_tokens = ('a', 'b', 'c', 'd', 'e')

没有 numba 的 timeit 性能：

>>> number = 200000
>>> timeit.timeit(lambda: get_exact_score_without_numba(tokens_to_match, candidate_tokens), number=number)
0.0962326959999995

使用 numba：

>>> timeit.timeit(lambda: get_exact_score_with_numba(tokens_to_match, candidate_tokens), number=number)
9.441522490000011

所以 numba 慢了 100 倍。

Answer 1

get_exact_score_without_numba 函数在我的机器上需要 0.275 us，对于 CPython 中的函数运行来说，这是一个 非常短的时间口译员。由于从 CPython 切换到 C 代码、进行一些内部检查等的成本，一个空的 Numba 函数在我的机器上至少需要 0.25 us。因此，Numba 在这个基准测试中不可能明显更快。

除此之外，get_exact_score_with_numba 在这种情况下仍然异常缓慢，因为它在我的机器上需要 25 us。在调用已编译函数之前，此开销来自 Numba 本身。更具体地说，它似乎来自 CPython 到 Numba 的内部类型转换（主要是由于字符串）。 到目前为止，Numba（以及字节数组）还没有很好地支持字符串。目前仅提供实验性支持。

为什么 numba 的性能比原生 python 字符串元组差 100？

Why numba is performing 100 worse than native python with Tuple of string?

python

numba