如何有效且正确地实现 numba jit 装饰器或应用矢量化而不是 for 循环来加速程序执行?
How to efficiently and correctly implement numba jit decorator or apply vectorization instead of a for loop to speed up the program execution?
尝试实现 jit 装饰器以提高我的代码的执行速度。没有得到正确的结果。它正在经历各种错误..键错误,类型错误等..
没有 numba 的实际代码可以正常工作。
# The Code without numba is:
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
df['A'] = 0.0
df['B'] = 0.0
df['Counter'] = 0
df['Counter'][0] = df['Serial'][0]
for i in range(0,len(df)-1):
# Filling Column 'A'
if (df.iloc[1+i,2] > df.iloc[1+i,1]) & (df.iloc[i,5] > df.iloc[1+i,1]) & (df.iloc[1+i,3] >0):
df.iloc[1+i,4] = round((df.iloc[1+i,1]*1.02),2)
elif (df.iloc[1+i,2] < df.iloc[1+i,1]) & (df.iloc[i,5] < df.iloc[1+i,1]) & (df.iloc[1+i,3] <0):
df.iloc[1+i,4] = round((df.iloc[1+i,1]*0.98),2)
else:
df.iloc[1+i,4] = df.iloc[i,4]
# Filling Column 'B'
df.iloc[1+i,5] = round(((df.iloc[1+i,1] + df.iloc[1+i,2])/2),2)
# Filling Column 'Counter'
if (df.iloc[1+i,5] > df.iloc[1+i,1]):
df.iloc[1+i,6] = df.iloc[1+i,0]
else:
df.iloc[1+i,6] = df.iloc[i,6]
df
下面的代码给出了错误。我试图实现 numba jit 装饰器来加速原始 python 代码。
#The code with numba jit which is throwing error is:
df = pd.DataFrame()
df['Serial']=[865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value']=[586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref']=[586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
from numba import jit
@jit(nopython=True)
def Calcs(Serial,Value,Ref,Base):
n = Base.size
A = np.empty(n, dtype='f8')
B = np.empty(n, dtype='f8')
Counter = np.empty(n, dtype='f8')
A[0] = 0.0
B[0] = 0.0
Counter[0] = Serial[0]
for i in range(0,n-1):
# Filling Column 'A'
if (Ref[i+1] > Value[i+1]) & (B[i] > Value[i+1]) & (Base[i+1] > 0):
A[i+1] = round((Value[i+1]*1.02),2)
elif (Ref[i+1] < Value[i+1]) & (B[i] < Value[i+1]) & (Base[i+1] < 0):
A[i+1] = round((Value[i+1]*0.98),2)
else:
A[i+1] = A[i]
# Filling Column 'B'
B[i+1] = round(((Value[i+1] + Ref[i+1])/2),2)
# Filling Column 'Counter'
if (B[i+1] > Value[i+1]):
Counter[i+1] = Serial[i+1]
else:
Counter[i+1] = Counter[i]
List = [A,B,Counter]
return List
Serial = df['Serial'].values.astype(np.float64)
Value = df['Value'].values.astype(np.float64)
Ref = df['Ref'].values.astype(np.float64)
Base = df['Base'].values.astype(np.float64)
VCal = Calcs(Serial,Value,Ref,Base)
df['A'].values[:] = VCal[0].astype(object)
df['B'].values[:] = VCal[1].astype(object)
df['Counter'].values[:] = VCal[2].astype(object)
df
我尝试按照@Jérôme Richard 为问题 提供的指导修改代码。
但是出现错误并且无法更正代码。从社区寻求一些帮助来纠正和改进上述代码,或者寻找更好的代码来提高执行速度。
代码的预期结果如下图所示。
如果数据框中存在 A
列,则只能使用 df['A'].values[:]
。否则你需要创建一个新的,可能 df['A'] = ...
.
此外,astype(object)
的技巧适用于字符串而不适用于数字。事实上,string-based 数据框列显然不使用 Numpy string-based 数组,而是使用包含 CPython 字符串的 Numpy object-based 数组。对于数字,Pandas 正确使用 number-based 数组。将数字转换回对象是低效的。这同样适用于 astype(np.float64)
:如果时间已经合适,则不需要。这里就是这种情况。如果您不确定输入类型,您可以考虑输入,因为它们不是很贵。
Numba 功能本身很好(至少对于最新版本的 Numba 是这样)。请注意,您可以将签名指定为 compile the function eagerly。此功能还可以帮助您更快地发现输入错误并使它们更加清晰。缺点是它使函数不太通用,因为仅支持特定类型(尽管您可以指定多个签名)。
from numba import njit
@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])')
def Calcs(Serial,Value,Ref,Base):
[...]
Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values
VCal = Calcs(Serial, Value, Ref, Base)
df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]
请注意,如果您确定输入数组从不包含 NaN 或 Inf 或 -0 等空间值,并且您不依赖于FP-math 结合性。
此代码给出列表到列表的转换类型错误。
from numba import njit
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])',fastmath=True)
def Calcs(Serial,Value,Ref,Base):
n = Base.size
A = np.empty(n)
B = np.empty(n)
Counter = np.empty(n)
A[0] = 0.0
B[0] = 0.0
Counter[0] = Serial[0]
for i in range(0,n-1):
# Filling Column 'A'
if (Ref[i+1] > Value[i+1]) & (B[i] > Value[i+1]) & (Base[i+1] > 0):
A[i+1] = round((Value[i+1]*1.02),2)
elif (Ref[i+1] < Value[i+1]) & (B[i] < Value[i+1]) & (Base[i+1] < 0):
A[i+1] = round((Value[i+1]*0.98),2)
else:
A[i+1] = A[i]
# Filling Column 'B'
B[i+1] = round(((Value[i+1] + Ref[i+1])/2),2)
# Filling Column 'Counter'
if (B[i+1] > Value[i+1]):
Counter[i+1] = Serial[i+1]
else:
Counter[i+1] = Counter[i]
List = [A,B,Counter]
return List
Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values
VCal = Calcs(Serial, Value, Ref, Base)
df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]
df
出现以下错误。
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No conversion from list(array(float64, 1d, C))<iv=None> to
list(array(float64, 1d, A))<iv=None> for '8return_value.5', defined at None
File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
<source elided>
List = [A,B,Counter]
return List
^
During: typing of assignment at <ipython-input-9-3c9e0fe02b75> (33)
File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
<source elided>
List = [A,B,Counter]
return List
^
尝试实现 jit 装饰器以提高我的代码的执行速度。没有得到正确的结果。它正在经历各种错误..键错误,类型错误等.. 没有 numba 的实际代码可以正常工作。
# The Code without numba is:
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
df['A'] = 0.0
df['B'] = 0.0
df['Counter'] = 0
df['Counter'][0] = df['Serial'][0]
for i in range(0,len(df)-1):
# Filling Column 'A'
if (df.iloc[1+i,2] > df.iloc[1+i,1]) & (df.iloc[i,5] > df.iloc[1+i,1]) & (df.iloc[1+i,3] >0):
df.iloc[1+i,4] = round((df.iloc[1+i,1]*1.02),2)
elif (df.iloc[1+i,2] < df.iloc[1+i,1]) & (df.iloc[i,5] < df.iloc[1+i,1]) & (df.iloc[1+i,3] <0):
df.iloc[1+i,4] = round((df.iloc[1+i,1]*0.98),2)
else:
df.iloc[1+i,4] = df.iloc[i,4]
# Filling Column 'B'
df.iloc[1+i,5] = round(((df.iloc[1+i,1] + df.iloc[1+i,2])/2),2)
# Filling Column 'Counter'
if (df.iloc[1+i,5] > df.iloc[1+i,1]):
df.iloc[1+i,6] = df.iloc[1+i,0]
else:
df.iloc[1+i,6] = df.iloc[i,6]
df
下面的代码给出了错误。我试图实现 numba jit 装饰器来加速原始 python 代码。
#The code with numba jit which is throwing error is:
df = pd.DataFrame()
df['Serial']=[865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value']=[586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref']=[586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
from numba import jit
@jit(nopython=True)
def Calcs(Serial,Value,Ref,Base):
n = Base.size
A = np.empty(n, dtype='f8')
B = np.empty(n, dtype='f8')
Counter = np.empty(n, dtype='f8')
A[0] = 0.0
B[0] = 0.0
Counter[0] = Serial[0]
for i in range(0,n-1):
# Filling Column 'A'
if (Ref[i+1] > Value[i+1]) & (B[i] > Value[i+1]) & (Base[i+1] > 0):
A[i+1] = round((Value[i+1]*1.02),2)
elif (Ref[i+1] < Value[i+1]) & (B[i] < Value[i+1]) & (Base[i+1] < 0):
A[i+1] = round((Value[i+1]*0.98),2)
else:
A[i+1] = A[i]
# Filling Column 'B'
B[i+1] = round(((Value[i+1] + Ref[i+1])/2),2)
# Filling Column 'Counter'
if (B[i+1] > Value[i+1]):
Counter[i+1] = Serial[i+1]
else:
Counter[i+1] = Counter[i]
List = [A,B,Counter]
return List
Serial = df['Serial'].values.astype(np.float64)
Value = df['Value'].values.astype(np.float64)
Ref = df['Ref'].values.astype(np.float64)
Base = df['Base'].values.astype(np.float64)
VCal = Calcs(Serial,Value,Ref,Base)
df['A'].values[:] = VCal[0].astype(object)
df['B'].values[:] = VCal[1].astype(object)
df['Counter'].values[:] = VCal[2].astype(object)
df
我尝试按照@Jérôme Richard 为问题
但是出现错误并且无法更正代码。从社区寻求一些帮助来纠正和改进上述代码,或者寻找更好的代码来提高执行速度。
代码的预期结果如下图所示。
如果数据框中存在 A
列,则只能使用 df['A'].values[:]
。否则你需要创建一个新的,可能 df['A'] = ...
.
此外,astype(object)
的技巧适用于字符串而不适用于数字。事实上,string-based 数据框列显然不使用 Numpy string-based 数组,而是使用包含 CPython 字符串的 Numpy object-based 数组。对于数字,Pandas 正确使用 number-based 数组。将数字转换回对象是低效的。这同样适用于 astype(np.float64)
:如果时间已经合适,则不需要。这里就是这种情况。如果您不确定输入类型,您可以考虑输入,因为它们不是很贵。
Numba 功能本身很好(至少对于最新版本的 Numba 是这样)。请注意,您可以将签名指定为 compile the function eagerly。此功能还可以帮助您更快地发现输入错误并使它们更加清晰。缺点是它使函数不太通用,因为仅支持特定类型(尽管您可以指定多个签名)。
from numba import njit
@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])')
def Calcs(Serial,Value,Ref,Base):
[...]
Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values
VCal = Calcs(Serial, Value, Ref, Base)
df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]
请注意,如果您确定输入数组从不包含 NaN 或 Inf 或 -0 等空间值,并且您不依赖于FP-math 结合性。
此代码给出列表到列表的转换类型错误。
from numba import njit
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])',fastmath=True)
def Calcs(Serial,Value,Ref,Base):
n = Base.size
A = np.empty(n)
B = np.empty(n)
Counter = np.empty(n)
A[0] = 0.0
B[0] = 0.0
Counter[0] = Serial[0]
for i in range(0,n-1):
# Filling Column 'A'
if (Ref[i+1] > Value[i+1]) & (B[i] > Value[i+1]) & (Base[i+1] > 0):
A[i+1] = round((Value[i+1]*1.02),2)
elif (Ref[i+1] < Value[i+1]) & (B[i] < Value[i+1]) & (Base[i+1] < 0):
A[i+1] = round((Value[i+1]*0.98),2)
else:
A[i+1] = A[i]
# Filling Column 'B'
B[i+1] = round(((Value[i+1] + Ref[i+1])/2),2)
# Filling Column 'Counter'
if (B[i+1] > Value[i+1]):
Counter[i+1] = Serial[i+1]
else:
Counter[i+1] = Counter[i]
List = [A,B,Counter]
return List
Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values
VCal = Calcs(Serial, Value, Ref, Base)
df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]
df
出现以下错误。
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No conversion from list(array(float64, 1d, C))<iv=None> to
list(array(float64, 1d, A))<iv=None> for '8return_value.5', defined at None
File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
<source elided>
List = [A,B,Counter]
return List
^
During: typing of assignment at <ipython-input-9-3c9e0fe02b75> (33)
File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
<source elided>
List = [A,B,Counter]
return List
^