Numba @jit(nopython=True) 函数对重型 Numpy 函数没有提供速度改进
Numba @jit(nopython=True) function offers no speed improvement on heavy Numpy function
我目前运行test_matrix_speed()
想看看我的search_and_book_availability
功能有多快。使用 PyCharm 分析器,我可以看到每个 search_and_book_availability
函数调用的平均速度为 0.001 毫秒。使用 Numba @jit(nopython=True)
装饰器对该函数的性能没有影响。这是因为没有改进,Numpy 在这里运行得尽可能快吗? (我不关心 generate_searches
函数的速度)
这是我的代码 运行
import random
import numpy as np
from numba import jit
def generate_searches(number, sim_start, sim_end):
searches = []
for i in range(number):
start_slot = random.randint(sim_start, sim_end - 1)
end_slot = random.randint(start_slot + 1, sim_end)
searches.append((start_slot, end_slot))
return searches
@jit(nopython=True)
def search_and_book_availability(matrix, search_start, search_end):
search_slice = matrix[:, search_start:search_end]
output = np.where(np.sum(search_slice, axis=1) == 0)[0]
number_of_bookable_vecs = output.size
if number_of_bookable_vecs > 0:
if number_of_bookable_vecs == 1:
id_to_book = output[0]
else:
id_to_book = np.random.choice(output)
matrix[id_to_book, search_start:search_end] = 1
return True
else:
return False
def test_matrix_speed():
shape = (10, 1440)
matrix = np.zeros(shape)
sim_start = 0
sim_end = 1440
searches = generate_searches(1000000, sim_start, sim_end)
for i in searches:
search_start = i[0]
search_end = i[1]
availability = search_and_book_availability(matrix, search_start, search_end)
使用您的函数和以下代码来分析速度
import time
shape = (10, 1440)
matrix = np.zeros(shape)
sim_start = 0
sim_end = 1440
searches = generate_searches(1000000, sim_start, sim_end)
def reset():
matrix[:] = 0
def test_matrix_speed():
for i in searches:
search_start = i[0]
search_end = i[1]
availability = search_and_book_availability(matrix, search_start, search_end)
def timeit(func):
# warmup
reset()
func()
reset()
start = time.time()
func()
end = time.time()
return end - start
print(timeit(test_matrix_speed))
我发现 jit
ed 版本大约为 11.5s,没有 jit
版本大约为 7.5s。我不是 numba 方面的专家,但它的目的是优化以非矢量化方式编写的数字代码,特别是显式 for
循环。在你的代码中有 none,你只使用矢量化操作。因此,我预计 jit
不会优于基线解决方案,但我必须承认,我很惊讶地看到它更糟。如果您希望优化您的解决方案,您可以使用以下代码缩短执行时间(至少在我的 PC 上):
def search_and_book_availability_opt(matrix, search_start, search_end):
search_slice = matrix[:, search_start:search_end]
# we don't need to sum in order to check if all elements are 0.
# ndarray.any() can use short-circuiting and is therefore faster.
# Also, we don't need the selected values from np.where, only the
# indexes, so np.nonzero is faster
bookable, = np.nonzero(~search_slice.any(axis=1))
# short circuit
if bookable.size == 0:
return False
# we can perform random choice even if size is 1
id_to_book = np.random.choice(bookable)
matrix[id_to_book, search_start:search_end] = 1
return True
并通过将 matrix
初始化为 np.zeros(shape, dtype=np.bool)
,而不是默认的 float64
。我能够获得大约 3.8 秒的执行时间,比你的 unjited 解决方案提高了 ~50%,比 jited 版本提高了~70%。希望对您有所帮助。
我目前运行test_matrix_speed()
想看看我的search_and_book_availability
功能有多快。使用 PyCharm 分析器,我可以看到每个 search_and_book_availability
函数调用的平均速度为 0.001 毫秒。使用 Numba @jit(nopython=True)
装饰器对该函数的性能没有影响。这是因为没有改进,Numpy 在这里运行得尽可能快吗? (我不关心 generate_searches
函数的速度)
这是我的代码 运行
import random
import numpy as np
from numba import jit
def generate_searches(number, sim_start, sim_end):
searches = []
for i in range(number):
start_slot = random.randint(sim_start, sim_end - 1)
end_slot = random.randint(start_slot + 1, sim_end)
searches.append((start_slot, end_slot))
return searches
@jit(nopython=True)
def search_and_book_availability(matrix, search_start, search_end):
search_slice = matrix[:, search_start:search_end]
output = np.where(np.sum(search_slice, axis=1) == 0)[0]
number_of_bookable_vecs = output.size
if number_of_bookable_vecs > 0:
if number_of_bookable_vecs == 1:
id_to_book = output[0]
else:
id_to_book = np.random.choice(output)
matrix[id_to_book, search_start:search_end] = 1
return True
else:
return False
def test_matrix_speed():
shape = (10, 1440)
matrix = np.zeros(shape)
sim_start = 0
sim_end = 1440
searches = generate_searches(1000000, sim_start, sim_end)
for i in searches:
search_start = i[0]
search_end = i[1]
availability = search_and_book_availability(matrix, search_start, search_end)
使用您的函数和以下代码来分析速度
import time
shape = (10, 1440)
matrix = np.zeros(shape)
sim_start = 0
sim_end = 1440
searches = generate_searches(1000000, sim_start, sim_end)
def reset():
matrix[:] = 0
def test_matrix_speed():
for i in searches:
search_start = i[0]
search_end = i[1]
availability = search_and_book_availability(matrix, search_start, search_end)
def timeit(func):
# warmup
reset()
func()
reset()
start = time.time()
func()
end = time.time()
return end - start
print(timeit(test_matrix_speed))
我发现 jit
ed 版本大约为 11.5s,没有 jit
版本大约为 7.5s。我不是 numba 方面的专家,但它的目的是优化以非矢量化方式编写的数字代码,特别是显式 for
循环。在你的代码中有 none,你只使用矢量化操作。因此,我预计 jit
不会优于基线解决方案,但我必须承认,我很惊讶地看到它更糟。如果您希望优化您的解决方案,您可以使用以下代码缩短执行时间(至少在我的 PC 上):
def search_and_book_availability_opt(matrix, search_start, search_end):
search_slice = matrix[:, search_start:search_end]
# we don't need to sum in order to check if all elements are 0.
# ndarray.any() can use short-circuiting and is therefore faster.
# Also, we don't need the selected values from np.where, only the
# indexes, so np.nonzero is faster
bookable, = np.nonzero(~search_slice.any(axis=1))
# short circuit
if bookable.size == 0:
return False
# we can perform random choice even if size is 1
id_to_book = np.random.choice(bookable)
matrix[id_to_book, search_start:search_end] = 1
return True
并通过将 matrix
初始化为 np.zeros(shape, dtype=np.bool)
,而不是默认的 float64
。我能够获得大约 3.8 秒的执行时间,比你的 unjited 解决方案提高了 ~50%,比 jited 版本提高了~70%。希望对您有所帮助。