为什么 Numba pycc 编译中止?
Why does Numba pycc compilation abort?
首先是一些上下文:我正在尝试使用 scipy.integrate.odeint
经常使用不同的初始条件 x_
和参数 r_
和 d_
对耦合 ODE 进行积分。
我试图通过提前编译 ODE 的右侧并尝试减少调用 odeint
函数的次数来加快集成速度。
我正在尝试提前使用 numba.pycc.CC
编译 python 函数。这适用于简单的功能,例如:
import numpy
from numba.pycc import CC
cc = CC('x_test')
cc.verbose = True
@cc.export('x_test', 'f8(f8[:])')
def x_test(y):
return numpy.sum(numpy.log(y) * .5) # just a random combination of numpy functions I used to test the compilation
cc.compile()
我尝试编译的实际函数如下:
# code_generation3.py
import numpy
from numba.pycc import CC
"""
N = 94
input for x_dot_all could look like:
x_ = numpy.ones(N * 5)
x[4::5] = 5e13
t_ := some float from a numpy linspace. it is passed by odeint.
r_ = numpy.random.random(N * 4)
d_ = numpy.random.random(N * 4) * .8
In practice the size of x_ is 470 and of r_ and d_ is 376.
"""
cc = CC('x_temp_dot1')
cc.verbose = True
@cc.export('x_temp_dot1', 'f8[:](f8[:], f8, f8[:], f8[:], f8[:])')
def x_dot_all(x_,t_,r_,d_, h):
"""
rhs of the lotka volterra equation for all "patients"
:param x: initial conditions, always in groupings of 5: the first 4 is the bacteria count, the 5th entry is the carrying capacity
:param t: placeholder required by odeint
:param r: growth rates of the types of bacteria
:param d: death rates of the types of bacteria
returns the right hand side of the competitive lotka-volterra equation with finite and shared carrying capacity in the same ordering as the initial conditions
"""
def x_dot(x, t, r, d, j):
"""
rhs of the differential equation describing the intrahost evolution of the bacteria
:param x: initial conditions i.e. bacteria sizes and environmental carrying capacity
:param t: placeholder required by odeint
:param r: growth rates of the types of bacteria
:param d: death rates of the bacteria
:param j: placeholder for the return value
returns the right hand side of the competitive lotka-volterra equation with finite and shared carrying capacity
"""
j[:-1] = x[:-1] * r * (1 - numpy.sum(x[:-1]) / x[-1]) - d * x[:-1]
j[-1] = -numpy.sum(x[:-1])
return j
N = r_.shape[0]
j = numpy.zeros(5)
g = [x_dot(x_[5 * i : 5 * (i+1)], t_, r_[4 * i : 4* (i+1)], d_[4 * i: 4 * (i+1)], j) for i in numpy.arange(int(N / 4) )]
for index, value in enumerate(g):
h[5 * index : 5 * (index + 1)] = value
return h
cc.compile()
这里我得到以下错误信息:
[xxxxxx@xxxxxx ~]$ python code_generation3.py
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
generating LLVM code for 'x_temp_dot1' into /tmp/pycc-build-x_temp_dot1-wyamkfsy/x_temp_dot1.cpython-36m-x86_64-linux-gnu.o
python: /root/miniconda3/conda-bld/llvmdev_1531160641630/work/include/llvm/IR/GlobalValue.h:233: void llvm::GlobalValue::setVisibility(llvm::GlobalValue::VisibilityTypes): Assertion `(!hasLocalLinkage() || V == DefaultVisibility) && "local linkage requires default visibility"' failed.
Aborted
我想知道我做错了什么?
这两个函数都使用 @jit(nopython = True)
装饰器。
令我感到羞耻的是,我还尝试对列表理解进行硬编码(试图避免任何 for 循环和进一步的函数调用),但这有同样的问题。
我知道我 handling/creating return 值 h
和 j
的方式既不高效也不优雅,但我很难获得odeint
的正确形状的 return 值,因为 numba 不能很好地处理 numpy.reshape。
我在 numba documentation 中搜索了帮助,但这并没有帮助我理解我的问题。
我已经搜索了错误消息,但只找到了这个link,这可能是相似的。然而,将 numba 降级到 0.38.0 对我不起作用。
谢谢大家!
我想如果你先编译 x_dot
然后再编译 x_dot_all
就可以了。
无论如何,我建议将这两个功能结合起来。
循环在 Numba
中一般不是问题,但列表理解肯定是。还要尽量避免大量的小循环。 (矢量化命令,例如 numpy.sum(x[:-1])
都是单独的循环)。有时 Numba 能够组合这些循环以获得高效的代码,但并非每次都是如此。
例子
# code_generation3.py
import numpy
import numba as nb
from numba.pycc import CC
cc = CC('x_dot_all')
cc.verbose = True
@cc.export('x_dot_all_mod', 'f8[:](f8[:], f8, f8[:], f8[:], f8[:])')
def x_dot_all(x_,t_,r_,d_, h):
N = r_.shape[0]
for i in range(int(N / 4)):
sum_x=x_[5*i+0]+x_[5*i+1]+x_[5*i+2]+x_[5*i+3]
TMP=1.-(sum_x)/x_[5*i+4]
h[i*5+0]=x_[i*5+0]*r_[4*i+0]*TMP-d_[4*i+0]*x_[i*5+0]
h[i*5+1]=x_[i*5+1]*r_[4*i+1]*TMP-d_[4*i+1]*x_[i*5+1]
h[i*5+2]=x_[i*5+2]*r_[4*i+2]*TMP-d_[4*i+2]*x_[i*5+2]
h[i*5+3]=x_[i*5+3]*r_[4*i+3]*TMP-d_[4*i+3]*x_[i*5+3]
h[i*5+4]=-sum_x
return h
if __name__ == "__main__":
cc.compile()
性能
N=94
x_ = np.ones(N * 5)
x_[4::5] = 5e13
t_ = 15
r_ = np.random.random(N * 4)
d_ = np.random.random(N * 4) * .8
h = np.zeros(N * 5)
#your version: 38 µs
#new version: 1.8µs
首先是一些上下文:我正在尝试使用 scipy.integrate.odeint
经常使用不同的初始条件 x_
和参数 r_
和 d_
对耦合 ODE 进行积分。
我试图通过提前编译 ODE 的右侧并尝试减少调用 odeint
函数的次数来加快集成速度。
我正在尝试提前使用 numba.pycc.CC
编译 python 函数。这适用于简单的功能,例如:
import numpy
from numba.pycc import CC
cc = CC('x_test')
cc.verbose = True
@cc.export('x_test', 'f8(f8[:])')
def x_test(y):
return numpy.sum(numpy.log(y) * .5) # just a random combination of numpy functions I used to test the compilation
cc.compile()
我尝试编译的实际函数如下:
# code_generation3.py
import numpy
from numba.pycc import CC
"""
N = 94
input for x_dot_all could look like:
x_ = numpy.ones(N * 5)
x[4::5] = 5e13
t_ := some float from a numpy linspace. it is passed by odeint.
r_ = numpy.random.random(N * 4)
d_ = numpy.random.random(N * 4) * .8
In practice the size of x_ is 470 and of r_ and d_ is 376.
"""
cc = CC('x_temp_dot1')
cc.verbose = True
@cc.export('x_temp_dot1', 'f8[:](f8[:], f8, f8[:], f8[:], f8[:])')
def x_dot_all(x_,t_,r_,d_, h):
"""
rhs of the lotka volterra equation for all "patients"
:param x: initial conditions, always in groupings of 5: the first 4 is the bacteria count, the 5th entry is the carrying capacity
:param t: placeholder required by odeint
:param r: growth rates of the types of bacteria
:param d: death rates of the types of bacteria
returns the right hand side of the competitive lotka-volterra equation with finite and shared carrying capacity in the same ordering as the initial conditions
"""
def x_dot(x, t, r, d, j):
"""
rhs of the differential equation describing the intrahost evolution of the bacteria
:param x: initial conditions i.e. bacteria sizes and environmental carrying capacity
:param t: placeholder required by odeint
:param r: growth rates of the types of bacteria
:param d: death rates of the bacteria
:param j: placeholder for the return value
returns the right hand side of the competitive lotka-volterra equation with finite and shared carrying capacity
"""
j[:-1] = x[:-1] * r * (1 - numpy.sum(x[:-1]) / x[-1]) - d * x[:-1]
j[-1] = -numpy.sum(x[:-1])
return j
N = r_.shape[0]
j = numpy.zeros(5)
g = [x_dot(x_[5 * i : 5 * (i+1)], t_, r_[4 * i : 4* (i+1)], d_[4 * i: 4 * (i+1)], j) for i in numpy.arange(int(N / 4) )]
for index, value in enumerate(g):
h[5 * index : 5 * (index + 1)] = value
return h
cc.compile()
这里我得到以下错误信息:
[xxxxxx@xxxxxx ~]$ python code_generation3.py
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
generating LLVM code for 'x_temp_dot1' into /tmp/pycc-build-x_temp_dot1-wyamkfsy/x_temp_dot1.cpython-36m-x86_64-linux-gnu.o
python: /root/miniconda3/conda-bld/llvmdev_1531160641630/work/include/llvm/IR/GlobalValue.h:233: void llvm::GlobalValue::setVisibility(llvm::GlobalValue::VisibilityTypes): Assertion `(!hasLocalLinkage() || V == DefaultVisibility) && "local linkage requires default visibility"' failed.
Aborted
我想知道我做错了什么?
这两个函数都使用 @jit(nopython = True)
装饰器。
令我感到羞耻的是,我还尝试对列表理解进行硬编码(试图避免任何 for 循环和进一步的函数调用),但这有同样的问题。
我知道我 handling/creating return 值 h
和 j
的方式既不高效也不优雅,但我很难获得odeint
的正确形状的 return 值,因为 numba 不能很好地处理 numpy.reshape。
我在 numba documentation 中搜索了帮助,但这并没有帮助我理解我的问题。 我已经搜索了错误消息,但只找到了这个link,这可能是相似的。然而,将 numba 降级到 0.38.0 对我不起作用。
谢谢大家!
我想如果你先编译 x_dot
然后再编译 x_dot_all
就可以了。
无论如何,我建议将这两个功能结合起来。
循环在 Numba
中一般不是问题,但列表理解肯定是。还要尽量避免大量的小循环。 (矢量化命令,例如 numpy.sum(x[:-1])
都是单独的循环)。有时 Numba 能够组合这些循环以获得高效的代码,但并非每次都是如此。
例子
# code_generation3.py
import numpy
import numba as nb
from numba.pycc import CC
cc = CC('x_dot_all')
cc.verbose = True
@cc.export('x_dot_all_mod', 'f8[:](f8[:], f8, f8[:], f8[:], f8[:])')
def x_dot_all(x_,t_,r_,d_, h):
N = r_.shape[0]
for i in range(int(N / 4)):
sum_x=x_[5*i+0]+x_[5*i+1]+x_[5*i+2]+x_[5*i+3]
TMP=1.-(sum_x)/x_[5*i+4]
h[i*5+0]=x_[i*5+0]*r_[4*i+0]*TMP-d_[4*i+0]*x_[i*5+0]
h[i*5+1]=x_[i*5+1]*r_[4*i+1]*TMP-d_[4*i+1]*x_[i*5+1]
h[i*5+2]=x_[i*5+2]*r_[4*i+2]*TMP-d_[4*i+2]*x_[i*5+2]
h[i*5+3]=x_[i*5+3]*r_[4*i+3]*TMP-d_[4*i+3]*x_[i*5+3]
h[i*5+4]=-sum_x
return h
if __name__ == "__main__":
cc.compile()
性能
N=94
x_ = np.ones(N * 5)
x_[4::5] = 5e13
t_ = 15
r_ = np.random.random(N * 4)
d_ = np.random.random(N * 4) * .8
h = np.zeros(N * 5)
#your version: 38 µs
#new version: 1.8µs