str.find() 的 numba 实现如何比纯 python 慢?
How can numba implementation of str.find() be slower, than the pure python one?
str.find() 的纯 python 代码怎么可能比它的 numba 实现更快?
numba==0.48.0(无法加载 0.49.0,似乎有问题)
from timeit import default_timer as timer
from numba import jit,njit
def search_match(a,search,n):
for z in range(n):
i = a.find(search)
return i
@njit
def search_match_jit(a,search,n):
for z in range(n):
i = a.find(search)
return i
n = 10000000
a = '.56485.36853.32153.65646.34763.23152.11321.65886.54975.12781.'
search = '2315'
print('Str.find:')
start = timer()
i = search_match(a,search,n)
print(timer() - start)
i = search_match_jit(a,search,1) # precompile
print('Jit:')
start = timer()
i = search_match_jit(a,search,n)
print(timer() - start)
str.find
的内置 CPython 实现不是 "pure Python" - 它已经用 C 语言编写:https://github.com/python/cpython/blob/master/Objects/stringlib/find.h
这不是我们期望 Numba 加速的事情。事实上,由于 Numba 有其他复杂问题需要处理,所以它慢一点也就不足为奇了。请参阅 Numba documentation 中的以下 "warning",其中我将最后一句加粗以强调:
The performance of some operations is known to be slower than the CPython implementation. These include substring search (in
, .contains()
and find()
) and string creation (like .split()
). Improving the string performance is an ongoing task, but the speed of CPython is unlikely to be surpassed for basic string operation in isolation. Numba is most successfully used for larger algorithms that happen to involve strings, where basic string operations are not the bottleneck.
基本上,Numba 开发人员将字符串方法添加到 nopython 模式中,以便用户可以更轻松地编译他们的代码,这些用户可能有几行代码恰好涉及与重型数字代码混合的字符串没有任何重新设计的代码。但是 Numba 并不是为了加速字符串代码:它的目标是重型数字内容,而字符串支持只是为了方便。
str.find() 的纯 python 代码怎么可能比它的 numba 实现更快?
numba==0.48.0(无法加载 0.49.0,似乎有问题)
from timeit import default_timer as timer
from numba import jit,njit
def search_match(a,search,n):
for z in range(n):
i = a.find(search)
return i
@njit
def search_match_jit(a,search,n):
for z in range(n):
i = a.find(search)
return i
n = 10000000
a = '.56485.36853.32153.65646.34763.23152.11321.65886.54975.12781.'
search = '2315'
print('Str.find:')
start = timer()
i = search_match(a,search,n)
print(timer() - start)
i = search_match_jit(a,search,1) # precompile
print('Jit:')
start = timer()
i = search_match_jit(a,search,n)
print(timer() - start)
str.find
的内置 CPython 实现不是 "pure Python" - 它已经用 C 语言编写:https://github.com/python/cpython/blob/master/Objects/stringlib/find.h
这不是我们期望 Numba 加速的事情。事实上,由于 Numba 有其他复杂问题需要处理,所以它慢一点也就不足为奇了。请参阅 Numba documentation 中的以下 "warning",其中我将最后一句加粗以强调:
The performance of some operations is known to be slower than the CPython implementation. These include substring search (
in
,.contains()
andfind()
) and string creation (like.split()
). Improving the string performance is an ongoing task, but the speed of CPython is unlikely to be surpassed for basic string operation in isolation. Numba is most successfully used for larger algorithms that happen to involve strings, where basic string operations are not the bottleneck.
基本上,Numba 开发人员将字符串方法添加到 nopython 模式中,以便用户可以更轻松地编译他们的代码,这些用户可能有几行代码恰好涉及与重型数字代码混合的字符串没有任何重新设计的代码。但是 Numba 并不是为了加速字符串代码:它的目标是重型数字内容,而字符串支持只是为了方便。