Python 使用 numba prange 对循环封装进行并行化。为什么不工作
Python parallelisation of encapsulated for cycle with numba prange. Why not working
在做一些用 numba 并行化 3 封装 for cycle 的实验时,我意识到天真的方法实际上并没有提高性能。
以下代码产生以下时间(以秒为单位):
0.154625177383 # no numba
0.420143127441 # numba first time (lazy initialisation)
0.196285963058 # numba second time
0.200047016144 # nubma third time
0.199403047562 # nubma fourth time
知道我做错了什么吗?
import numpy as np
from numba import jit, prange
import time
def run_1():
dims = [100,100,100]
a = np.zeros(dims)
for x in range(100):
for y in range(100):
for z in range(100):
a[x,y,z] = 1
return a
@jit
def run_2():
dims = [100,100,100]
a = np.zeros(dims)
for x in prange(100):
for y in prange(100):
for z in prange(100):
a[x,y,z] = 1
return a
if __name__ == '__main__':
t = time.time()
run_1()
elapsed1 = time.time() - t
print elapsed1
t = time.time()
run_2()
elapsed2 = time.time() - t
print elapsed2
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
我想知道这些循环中是否有任何 JIT 代码:没有 non-trivial Python 代码可以编译,只有 C 代码的薄包装(是的,range
是 C 代码)。可能 JIT 只会增加试图分析和生成(未成功)更高效代码的开销。
想要speed-up就考虑parallelization using scipy or maybe direct access to NumPy arrays from Cython.
在做一些用 numba 并行化 3 封装 for cycle 的实验时,我意识到天真的方法实际上并没有提高性能。 以下代码产生以下时间(以秒为单位):
0.154625177383 # no numba
0.420143127441 # numba first time (lazy initialisation)
0.196285963058 # numba second time
0.200047016144 # nubma third time
0.199403047562 # nubma fourth time
知道我做错了什么吗?
import numpy as np
from numba import jit, prange
import time
def run_1():
dims = [100,100,100]
a = np.zeros(dims)
for x in range(100):
for y in range(100):
for z in range(100):
a[x,y,z] = 1
return a
@jit
def run_2():
dims = [100,100,100]
a = np.zeros(dims)
for x in prange(100):
for y in prange(100):
for z in prange(100):
a[x,y,z] = 1
return a
if __name__ == '__main__':
t = time.time()
run_1()
elapsed1 = time.time() - t
print elapsed1
t = time.time()
run_2()
elapsed2 = time.time() - t
print elapsed2
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
t = time.time()
run_2()
elapsed3 = time.time() - t
print elapsed3
我想知道这些循环中是否有任何 JIT 代码:没有 non-trivial Python 代码可以编译,只有 C 代码的薄包装(是的,range
是 C 代码)。可能 JIT 只会增加试图分析和生成(未成功)更高效代码的开销。
想要speed-up就考虑parallelization using scipy or maybe direct access to NumPy arrays from Cython.