python 中的矢量化球贝塞尔函数?
Vectorized spherical bessel functions in python?
我注意到 scipy.special
n 阶和参数 x jv(n,x)
的贝塞尔函数在 x:
中被向量化
In [14]: import scipy.special as sp
In [16]: sp.jv(1, range(3)) # n=1, [x=0,1,2]
Out[16]: array([ 0., 0.44005059, 0.57672481])
但是球贝塞尔函数没有相应的矢量化形式,sp.sph_jn
:
In [19]: sp.sph_jn(1,range(3))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-ea59d2f45497> in <module>()
----> 1 sp.sph_jn(1,range(3)) #n=1, 3 value array
/home/glue/anaconda/envs/fibersim/lib/python2.7/site-packages/scipy/special/basic.pyc in sph_jn(n, z)
262 """
263 if not (isscalar(n) and isscalar(z)):
--> 264 raise ValueError("arguments must be scalars.")
265 if (n != floor(n)) or (n < 0):
266 raise ValueError("n must be a non-negative integer.")
ValueError: arguments must be scalars.
此外,球形贝塞尔函数一次计算 N 的所有阶数。因此,如果我想要参数 x=10
的 n=5
Bessel 函数,它 returns n=1,2,3,4,5。它实际上 returns jn 及其导数一次通过:
In [21]: sp.sph_jn(5,10)
Out[21]:
(array([-0.05440211, 0.07846694, 0.07794219, -0.03949584, -0.10558929,
-0.05553451]),
array([-0.07846694, -0.0700955 , 0.05508428, 0.09374053, 0.0132988 ,
-0.07226858]))
为什么 API 中存在这种不对称性,有谁知道 return 球贝塞尔函数矢量化的库,或者至少更快(即在 cython 中)?
你可以写一个cython函数来加速计算,你必须做的第一件事就是获取fortran函数的地址SPHJ
,这里是如何做到这一点Python :
from scipy import special as sp
sphj = sp.specfun.sphj
import ctypes
addr = ctypes.pythonapi.PyCObject_AsVoidPtr(ctypes.py_object(sphj._cpointer))
然后就可以在Cython中直接调用fortran函数了,注意我用prange()
是为了使用多核加速计算:
%%cython -c-Ofast -c-fopenmp --link-args=-fopenmp
from cpython.mem cimport PyMem_Malloc, PyMem_Free
from cython.parallel import prange
import numpy as np
import cython
from cpython cimport PyCObject_AsVoidPtr
from scipy import special
ctypedef void (*sphj_ptr) (const int *n, const double *x,
const int *nm, const double *sj, const double *dj) nogil
cdef sphj_ptr _sphj=<sphj_ptr>PyCObject_AsVoidPtr(special.specfun.sphj._cpointer)
@cython.wraparound(False)
@cython.boundscheck(False)
def cython_sphj2(int n, double[::1] x):
cdef int count = x.shape[0]
cdef double * sj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
cdef double * dj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
cdef int * mn = <int *>PyMem_Malloc(count * sizeof(int))
cdef double[::1] res = np.empty(count)
cdef int i
if count < 100:
for i in range(x.shape[0]):
_sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
res[i] = sj[i*(n+1) + n] #choose the element you want here
else:
for i in prange(count, nogil=True):
_sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
res[i] = sj[i*(n+1) + n] #choose the element you want here
PyMem_Free(sj)
PyMem_Free(dj)
PyMem_Free(mn)
return res.base
为了比较,这里是在 forloop 中调用 sphj()
的 Python 函数:
import numpy as np
def python_sphj(n, x):
sphj = special.specfun.sphj
res = np.array([sphj(n, v)[1][n] for v in x])
return res
这是 10 个元素的 %timit 结果:
x = np.linspace(1, 2, 10)
r1 = cython_sphj2(4, x)
r2 = python_sphj(4, x)
assert np.allclose(r1, r2)
%timeit cython_sphj2(4, x)
%timeit python_sphj(4, x)
结果:
10000 loops, best of 3: 21.5 µs per loop
10000 loops, best of 3: 28.1 µs per loop
这是 100000 个元素的结果:
x = np.linspace(1, 2, 100000)
r1 = cython_sphj2(4, x)
r2 = python_sphj(4, x)
assert np.allclose(r1, r2)
%timeit cython_sphj2(4, x)
%timeit python_sphj(4, x)
结果:
10 loops, best of 3: 44.7 ms per loop
1 loops, best of 3: 231 ms per loop
有一个 pull request 将矢量化球形贝塞尔函数例程合并到 SciPy 中,如 scipy.special.spherical_x
,x = jn, yn, in, kn
。运气好的话,他们应该会进入 0.18.0 版。
相对于 np.vectorize
(即 for-loop)的性能改进取决于函数,但可以是数量级。
import numpy as np
from scipy import special
@np.vectorize
def sphj_vectorize(n, z):
return special.sph_jn(n, z)[0][-1]
x = np.linspace(1, 2, 10**5)
%timeit sphj_vectorize(4, x)
1 loops, best of 3: 1.47 s per loop
%timeit special.spherical_jn(4, x)
100 loops, best of 3: 8.07 ms per loop
如果有人仍然感兴趣,我发现一个解决方案比 Ted Pudlik 的解决方案快近 17 倍。我使用了这样一个事实,即 n 阶球形贝塞尔函数本质上是 n+1/2 阶标准贝塞尔函数的 1/sqrt(x) 倍,它已经被向量化了:
import numpy as np
from scipy import special
sphj_bessel = lambda n, z: special.jv(n+1/2,z)*np.sqrt(np.pi/2)/(np.sqrt(z))
我得到以下时间:
%timeit sphj_vectorize(2, x) # x = np.linspace(1, 2, 10**5)
1 loops, best of 3: 759 ms per loop
%timeit sphj_bessel(2,x) # x = np.linspace(1, 2, 10**5)
10 loops, best of 3: 44.6 ms per loop
我注意到 scipy.special
n 阶和参数 x jv(n,x)
的贝塞尔函数在 x:
In [14]: import scipy.special as sp
In [16]: sp.jv(1, range(3)) # n=1, [x=0,1,2]
Out[16]: array([ 0., 0.44005059, 0.57672481])
但是球贝塞尔函数没有相应的矢量化形式,sp.sph_jn
:
In [19]: sp.sph_jn(1,range(3))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-ea59d2f45497> in <module>()
----> 1 sp.sph_jn(1,range(3)) #n=1, 3 value array
/home/glue/anaconda/envs/fibersim/lib/python2.7/site-packages/scipy/special/basic.pyc in sph_jn(n, z)
262 """
263 if not (isscalar(n) and isscalar(z)):
--> 264 raise ValueError("arguments must be scalars.")
265 if (n != floor(n)) or (n < 0):
266 raise ValueError("n must be a non-negative integer.")
ValueError: arguments must be scalars.
此外,球形贝塞尔函数一次计算 N 的所有阶数。因此,如果我想要参数 x=10
的 n=5
Bessel 函数,它 returns n=1,2,3,4,5。它实际上 returns jn 及其导数一次通过:
In [21]: sp.sph_jn(5,10)
Out[21]:
(array([-0.05440211, 0.07846694, 0.07794219, -0.03949584, -0.10558929,
-0.05553451]),
array([-0.07846694, -0.0700955 , 0.05508428, 0.09374053, 0.0132988 ,
-0.07226858]))
为什么 API 中存在这种不对称性,有谁知道 return 球贝塞尔函数矢量化的库,或者至少更快(即在 cython 中)?
你可以写一个cython函数来加速计算,你必须做的第一件事就是获取fortran函数的地址SPHJ
,这里是如何做到这一点Python :
from scipy import special as sp
sphj = sp.specfun.sphj
import ctypes
addr = ctypes.pythonapi.PyCObject_AsVoidPtr(ctypes.py_object(sphj._cpointer))
然后就可以在Cython中直接调用fortran函数了,注意我用prange()
是为了使用多核加速计算:
%%cython -c-Ofast -c-fopenmp --link-args=-fopenmp
from cpython.mem cimport PyMem_Malloc, PyMem_Free
from cython.parallel import prange
import numpy as np
import cython
from cpython cimport PyCObject_AsVoidPtr
from scipy import special
ctypedef void (*sphj_ptr) (const int *n, const double *x,
const int *nm, const double *sj, const double *dj) nogil
cdef sphj_ptr _sphj=<sphj_ptr>PyCObject_AsVoidPtr(special.specfun.sphj._cpointer)
@cython.wraparound(False)
@cython.boundscheck(False)
def cython_sphj2(int n, double[::1] x):
cdef int count = x.shape[0]
cdef double * sj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
cdef double * dj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
cdef int * mn = <int *>PyMem_Malloc(count * sizeof(int))
cdef double[::1] res = np.empty(count)
cdef int i
if count < 100:
for i in range(x.shape[0]):
_sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
res[i] = sj[i*(n+1) + n] #choose the element you want here
else:
for i in prange(count, nogil=True):
_sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
res[i] = sj[i*(n+1) + n] #choose the element you want here
PyMem_Free(sj)
PyMem_Free(dj)
PyMem_Free(mn)
return res.base
为了比较,这里是在 forloop 中调用 sphj()
的 Python 函数:
import numpy as np
def python_sphj(n, x):
sphj = special.specfun.sphj
res = np.array([sphj(n, v)[1][n] for v in x])
return res
这是 10 个元素的 %timit 结果:
x = np.linspace(1, 2, 10)
r1 = cython_sphj2(4, x)
r2 = python_sphj(4, x)
assert np.allclose(r1, r2)
%timeit cython_sphj2(4, x)
%timeit python_sphj(4, x)
结果:
10000 loops, best of 3: 21.5 µs per loop
10000 loops, best of 3: 28.1 µs per loop
这是 100000 个元素的结果:
x = np.linspace(1, 2, 100000)
r1 = cython_sphj2(4, x)
r2 = python_sphj(4, x)
assert np.allclose(r1, r2)
%timeit cython_sphj2(4, x)
%timeit python_sphj(4, x)
结果:
10 loops, best of 3: 44.7 ms per loop
1 loops, best of 3: 231 ms per loop
有一个 pull request 将矢量化球形贝塞尔函数例程合并到 SciPy 中,如 scipy.special.spherical_x
,x = jn, yn, in, kn
。运气好的话,他们应该会进入 0.18.0 版。
相对于 np.vectorize
(即 for-loop)的性能改进取决于函数,但可以是数量级。
import numpy as np
from scipy import special
@np.vectorize
def sphj_vectorize(n, z):
return special.sph_jn(n, z)[0][-1]
x = np.linspace(1, 2, 10**5)
%timeit sphj_vectorize(4, x)
1 loops, best of 3: 1.47 s per loop
%timeit special.spherical_jn(4, x)
100 loops, best of 3: 8.07 ms per loop
如果有人仍然感兴趣,我发现一个解决方案比 Ted Pudlik 的解决方案快近 17 倍。我使用了这样一个事实,即 n 阶球形贝塞尔函数本质上是 n+1/2 阶标准贝塞尔函数的 1/sqrt(x) 倍,它已经被向量化了:
import numpy as np
from scipy import special
sphj_bessel = lambda n, z: special.jv(n+1/2,z)*np.sqrt(np.pi/2)/(np.sqrt(z))
我得到以下时间:
%timeit sphj_vectorize(2, x) # x = np.linspace(1, 2, 10**5)
1 loops, best of 3: 759 ms per loop
%timeit sphj_bessel(2,x) # x = np.linspace(1, 2, 10**5)
10 loops, best of 3: 44.6 ms per loop