优化 python 子集的连通分量标记区域
Optimize python for Connected Component Labeling Area of Subsets
我有一个二元图,我在上面做连通分量标记并得到类似这样的 64x64 网格 - http://pastebin.com/bauas0NJ
现在我想按标签对它们进行分组,这样我就可以找到它们的面积和质心。这就是我所做的:
#ccl_np is the computed array from the previous step (see pastebin)
#I discard the label '1' as its the background
unique, count = np.unique(ccl_np, return_counts = True)
xcm_array = []
ycm_array = []
for i in range(1,len(unique)):
subarray = np.where(ccl_np == unique[i])
xcm_array.append("{0:.5f}".format((sum(subarray[0]))/(count[i]*1.)))
ycm_array.append("{0:.5f}".format((sum(subarray[1]))/(count[i]*1.)))
final_array = zip(xcm_array,ycm_array,count[1:])
我想要一个快速代码(因为我将对大小为 4096x4096 的网格执行此操作)并被告知检查 numba。这是我天真的尝试:
unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)
@numba.autojit
def mysolver(xcm_array, ycm_array, inverse, count):
for i in range(64):
for j in range(64):
pos = inverse[i][j]
local_count = count[pos]
xcm_array[pos] += i/(local_count*1.)
ycm_array[pos] += j/(local_count*1.)
mysolver(xcm_array, ycm_array, inverse, count)
final_array = zip(xcm_array,ycm_array,count)
令我惊讶的是,使用 numba 的速度较慢,或者至多等于以前的速度。我究竟做错了什么 ?
另外,这可以在 Cython 中完成吗?那样会更快吗?
我正在使用最新的 Anaconda python 2.7 发行版中包含的软件包。
我认为问题可能是您对 jit 代码的计时不正确。第一次 运行 代码时,您的时间包括 numba 编译代码所需的时间。这称为预热 jit。如果您再次调用它,该成本就没有了。
import numpy as np
import numba as nb
unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)
def mysolver(xcm_array, ycm_array, inverse, count):
for i in range(64):
for j in range(64):
pos = inverse[i][j]
local_count = count[pos]
xcm_array[pos] += i/(local_count*1.)
ycm_array[pos] += j/(local_count*1.)
@nb.jit(nopython=True)
def mysolver_nb(xcm_array, ycm_array, inverse, count):
for i in range(64):
for j in range(64):
pos = inverse[i,j]
local_count = count[pos]
xcm_array[pos] += i/(local_count*1.)
ycm_array[pos] += j/(local_count*1.)
然后是 timeit
的计时,其中 运行 代码多次。首先是普通的 python 版本:
In [4]:%timeit mysolver(xcm_array, ycm_array, inverse, count)
10 loops, best of 3: 25.8 ms per loop
然后用 numba:
In [5]: %timeit mysolver_nb(xcm_array, ycm_array, inverse, count)
The slowest run took 3630.44 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 33.1 µs per loop
numba 代码快约 1000 倍。
我有一个二元图,我在上面做连通分量标记并得到类似这样的 64x64 网格 - http://pastebin.com/bauas0NJ
现在我想按标签对它们进行分组,这样我就可以找到它们的面积和质心。这就是我所做的:
#ccl_np is the computed array from the previous step (see pastebin)
#I discard the label '1' as its the background
unique, count = np.unique(ccl_np, return_counts = True)
xcm_array = []
ycm_array = []
for i in range(1,len(unique)):
subarray = np.where(ccl_np == unique[i])
xcm_array.append("{0:.5f}".format((sum(subarray[0]))/(count[i]*1.)))
ycm_array.append("{0:.5f}".format((sum(subarray[1]))/(count[i]*1.)))
final_array = zip(xcm_array,ycm_array,count[1:])
我想要一个快速代码(因为我将对大小为 4096x4096 的网格执行此操作)并被告知检查 numba。这是我天真的尝试:
unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)
@numba.autojit
def mysolver(xcm_array, ycm_array, inverse, count):
for i in range(64):
for j in range(64):
pos = inverse[i][j]
local_count = count[pos]
xcm_array[pos] += i/(local_count*1.)
ycm_array[pos] += j/(local_count*1.)
mysolver(xcm_array, ycm_array, inverse, count)
final_array = zip(xcm_array,ycm_array,count)
令我惊讶的是,使用 numba 的速度较慢,或者至多等于以前的速度。我究竟做错了什么 ? 另外,这可以在 Cython 中完成吗?那样会更快吗?
我正在使用最新的 Anaconda python 2.7 发行版中包含的软件包。
我认为问题可能是您对 jit 代码的计时不正确。第一次 运行 代码时,您的时间包括 numba 编译代码所需的时间。这称为预热 jit。如果您再次调用它,该成本就没有了。
import numpy as np
import numba as nb
unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)
def mysolver(xcm_array, ycm_array, inverse, count):
for i in range(64):
for j in range(64):
pos = inverse[i][j]
local_count = count[pos]
xcm_array[pos] += i/(local_count*1.)
ycm_array[pos] += j/(local_count*1.)
@nb.jit(nopython=True)
def mysolver_nb(xcm_array, ycm_array, inverse, count):
for i in range(64):
for j in range(64):
pos = inverse[i,j]
local_count = count[pos]
xcm_array[pos] += i/(local_count*1.)
ycm_array[pos] += j/(local_count*1.)
然后是 timeit
的计时,其中 运行 代码多次。首先是普通的 python 版本:
In [4]:%timeit mysolver(xcm_array, ycm_array, inverse, count)
10 loops, best of 3: 25.8 ms per loop
然后用 numba:
In [5]: %timeit mysolver_nb(xcm_array, ycm_array, inverse, count)
The slowest run took 3630.44 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 33.1 µs per loop
numba 代码快约 1000 倍。