为什么 numba 执行 numpy 计算比执行正常 python 代码花费更长的时间?

Why is numba taking longer time to execute numpy calculations than executing normal python code?

parallel.py 是一个 python 文件,它使用 numba 和 numpy 来计算两个矩阵的对角线之和。这里的主要目的是找到使用 numba 的执行速度。 parallel.py 大约需要 0.55 秒才能完成执行,而另一个文件 (sequencial.py) 中的相同代码,用纯 python 编写需要 0.00 秒才能完成解决同样的问题,就是这样讽刺的。 我不确定我是否很好地利用了 numba,有人可以建议我需要做什么来实现我的 objective.

parallel.py 从 numba 导入 jit,njit 将 numpy 导入为 np 导入时间

@jit(nopython=True)
def create_matrix(row, col):
    arr = np.zeros((row, col))
    for i in range(row):
        for j in range(1, col + 1):
            arr[i, j - 1] = j + (col * i)
    return arr


print("FIND THE SUM OF PRIMARY DIAGONALS OF ANY TWO MATRICES: ")

start = time.perf_counter()

# calculate the sum of primary diagonals of matrix1
m1 = create_matrix(4, 4)  # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 1 : {m1}")
print(f"Matrix 1 diagonal: {np.diagonal(m1)}")
print(f"Matrix 1 sum of primary diagonal is : {np.trace(m1)}")
mat1_sum = np.trace(m1)

# calculate the sum of primary diagonals of matrix2
m2 = create_matrix(4, 4)  # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 2 : {m2}")
print(f"Matrix 2 diagonal : {np.diagonal(m2)}")
print(f"Matrix 2 Sum of diagonal is : {np.trace(m2)}")
mat2_sum = np.trace(m2, dtype='i')

sum_of_two_diagonals = mat1_sum + mat2_sum
print(f"THE SUM IS :  {sum_of_two_diagonals}")

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds(s)")

sequencial.py

import numpy as np
import time

def create_matrix(row, col):
    arr = np.zeros((row, col))
    for i in range(row):
        for j in range(1, col + 1):
            arr[i, j - 1] = j + (col * i)
    return arr

print("FIND THE SUM OF PRIMARY DIAGONALS OF ANY TWO MATRICES: ")

start =  time.perf_counter()

# calculate the sum of primary diagonals of matrix1
mat_1 = create_matrix(4, 4) # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 1 : {mat_1}")
mat1_sum_of_primary_diagonal = 0
for i in range(len(mat_1)):
    for j in range(len(mat_1[i])):
        if i == j:
             print(mat_1[i][j])
             mat1_sum_of_primary_diagonal = mat1_sum_of_primary_diagonal + mat_1[i][j]

print(f"Matrix 1 sum of diagnals is: {mat1_sum_of_primary_diagonal}")

 # calculate the sum of primary diagonals of matrix2
mat_2 = create_matrix(4, 4) # you can adjust the size of the matrix by changing the row and column in brackets
print(f"Matrix 1 : {mat_2}")
mat2_sum_of_primary_diagonal = 0
for i in range(len(mat_2)):
    for j in range(len(mat_2[i])):
        if i == j:
             print(mat_2[i][j])
             mat2_sum_of_primary_diagonal = mat2_sum_of_primary_diagonal + mat_2[i][j]

print(f"Matrix 1 sum of diagnals is: {mat2_sum_of_primary_diagonal}")

diagonals_total = mat1_sum_of_primary_diagonal + mat2_sum_of_primary_diagonal
print(f"THE SUM IS :  {diagonals_total}")

finish = time.perf_counter()
print(f"Finished in {round(finish - start, 2)} seconds(s)")

Numba 函数的编译时间包含在基准测试中,因为 Numba 使用 lazy compilation。您可以只指定函数参数的类型来急切地编译它。或者,您可以 运行 两次基准测试,只考虑第二次 运行.

这是一个例子:

import numba as nb

@nb.njit('float64[:,::1](int_, int_)')
def create_matrix(row, col):
    arr = np.zeros((row, col))
    for i in range(row):
        for j in range(1, col + 1):
            arr[i, j - 1] = j + (col * i)
    return arr

此外,请注意,最好不要在基准计时中包含 print 调用(因为时间可能不稳定,这可能不是您想要衡量的)。更不用说打印东西通常很慢(与基本计算相比)。

最后,请注意该脚本名为“parallel.py”,但不应并行执行任何操作,因为默认情况下 Numba 不会并行化代码(并且由于开销,它在您的情况下会更慢创建线程)。