如何有效且正确地实现 numba jit 装饰器或应用矢量化而不是 for 循环来加速程序执行?

How to efficiently and correctly implement numba jit decorator or apply vectorization instead of a for loop to speed up the program execution?

尝试实现 jit 装饰器以提高我的代码的执行速度。没有得到正确的结果。它正在经历各种错误..键错误,类型错误等.. 没有 numba 的实际代码可以正常工作。

# The Code without numba is:
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]

df['A'] = 0.0
df['B'] = 0.0
df['Counter'] = 0
df['Counter'][0] = df['Serial'][0]

for i in range(0,len(df)-1):
    # Filling Column 'A'
    if (df.iloc[1+i,2] > df.iloc[1+i,1]) & (df.iloc[i,5] > df.iloc[1+i,1]) & (df.iloc[1+i,3] >0):
        df.iloc[1+i,4] = round((df.iloc[1+i,1]*1.02),2)
    elif (df.iloc[1+i,2] < df.iloc[1+i,1]) & (df.iloc[i,5] < df.iloc[1+i,1]) & (df.iloc[1+i,3] <0):
        df.iloc[1+i,4] = round((df.iloc[1+i,1]*0.98),2)
    else:
        df.iloc[1+i,4] = df.iloc[i,4]
    # Filling Column 'B'
    df.iloc[1+i,5] = round(((df.iloc[1+i,1] + df.iloc[1+i,2])/2),2) 
    # Filling Column 'Counter'
    if (df.iloc[1+i,5] > df.iloc[1+i,1]):
        df.iloc[1+i,6] = df.iloc[1+i,0]
    else:
        df.iloc[1+i,6] = df.iloc[i,6]
df

下面的代码给出了错误。我试图实现 numba jit 装饰器来加速原始 python 代码。

#The code with numba jit which is throwing error is:
df = pd.DataFrame()
df['Serial']=[865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value']=[586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref']=[586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]
from numba import jit
@jit(nopython=True)
def Calcs(Serial,Value,Ref,Base):
    n = Base.size
    A = np.empty(n, dtype='f8')
    B = np.empty(n, dtype='f8')
    Counter = np.empty(n, dtype='f8')
    A[0] = 0.0
    B[0] = 0.0
    Counter[0] = Serial[0]
    for i in range(0,n-1):
        # Filling Column 'A'
        if (Ref[i+1] > Value[i+1]) & (B[i] > Value[i+1]) & (Base[i+1] > 0):
            A[i+1] = round((Value[i+1]*1.02),2)
        elif (Ref[i+1] < Value[i+1]) & (B[i] < Value[i+1]) & (Base[i+1] < 0):  
            A[i+1] = round((Value[i+1]*0.98),2)
        else:
            A[i+1] = A[i]
        # Filling Column 'B'
        B[i+1] = round(((Value[i+1] + Ref[i+1])/2),2)
        # Filling Column 'Counter'
        if (B[i+1] > Value[i+1]):
            Counter[i+1] = Serial[i+1]
        else:
            Counter[i+1] = Counter[i]   
    List = [A,B,Counter]        
    return List

Serial = df['Serial'].values.astype(np.float64)
Value = df['Value'].values.astype(np.float64)
Ref = df['Ref'].values.astype(np.float64)
Base = df['Base'].values.astype(np.float64)

VCal = Calcs(Serial,Value,Ref,Base)

df['A'].values[:] = VCal[0].astype(object)
df['B'].values[:] = VCal[1].astype(object)
df['Counter'].values[:] = VCal[2].astype(object)
df

我尝试按照@Jérôme Richard 为问题 提供的指导修改代码。

但是出现错误并且无法更正代码。从社区寻求一些帮助来纠正和改进上述代码,或者寻找更好的代码来提高执行速度。 代码的预期结果如下图所示。

如果数据框中存在 A 列,则只能使用 df['A'].values[:]。否则你需要创建一个新的,可能 df['A'] = ....

此外,astype(object) 的技巧适用于字符串而不适用于数字。事实上,string-based 数据框列显然不使用 Numpy string-based 数组,而是使用包含 CPython 字符串的 Numpy object-based 数组。对于数字,Pandas 正确使用 number-based 数组。将数字转换回对象是低效的。这同样适用于 astype(np.float64):如果时间已经合适,则不需要。这里就是这种情况。如果您不确定输入类型,您可以考虑输入,因为它们不是很贵。

Numba 功能本身很好(至少对于最新版本的 Numba 是这样)。请注意,您可以将签名指定为 compile the function eagerly。此功能还可以帮助您更快地发现输入错误并使它们更加清晰。缺点是它使函数不太通用,因为仅支持特定类型(尽管您可以指定多个签名)。

from numba import njit

@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])')
def Calcs(Serial,Value,Ref,Base):
    [...]

Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values

VCal = Calcs(Serial, Value, Ref, Base)

df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]

请注意,如果您确定输入数组从不包含 NaN 或 Inf 或 -0 等空间值,并且您不依赖于FP-math 结合性。

此代码给出列表到列表的转换类型错误。

from numba import njit
df = pd.DataFrame()
df['Serial'] = [865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880]
df['Value'] = [586,586.45,585.95,585.85,585.45,585.5,586,585.7,585.7,585.5,585.5,585.45,585.3,584,584,585]
df['Ref'] = [586.35,586.1,586.01,586.44,586.04,585.91,585.47,585.99,585.35,585.27,585.32,584.86,585.36,584.18,583.53,585]
df['Base'] = [0,-1,1,1,1,1,-1,0,1,1,1,0,1,1,0,-1]

@njit('List(float64[:])(float64[:], float64[:], float64[:], float64[:])',fastmath=True)
def Calcs(Serial,Value,Ref,Base):
    n = Base.size
    A = np.empty(n)
    B = np.empty(n)
    Counter = np.empty(n)
    A[0] = 0.0
    B[0] = 0.0
    Counter[0] = Serial[0]
    for i in range(0,n-1):
    # Filling Column 'A'
        if (Ref[i+1] > Value[i+1]) & (B[i] > Value[i+1]) & (Base[i+1] > 0):
            A[i+1] = round((Value[i+1]*1.02),2)
        elif (Ref[i+1] < Value[i+1]) & (B[i] < Value[i+1]) & (Base[i+1] < 0):  
            A[i+1] = round((Value[i+1]*0.98),2)
        else:
            A[i+1] = A[i]
    # Filling Column 'B'
        B[i+1] = round(((Value[i+1] + Ref[i+1])/2),2)
    # Filling Column 'Counter'
        if (B[i+1] > Value[i+1]):
            Counter[i+1] = Serial[i+1]
        else:
           Counter[i+1] = Counter[i]   
    List = [A,B,Counter]        
    return List

Serial = df['Serial'].values
Value = df['Value'].values
Ref = df['Ref'].values
Base = df['Base'].values
VCal = Calcs(Serial, Value, Ref, Base)
df['A'] = VCal[0]
df['B'] = VCal[1]
df['Counter'] = VCal[2]
df

出现以下错误。

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No conversion from list(array(float64, 1d, C))<iv=None> to 
list(array(float64, 1d, A))<iv=None> for '8return_value.5', defined at None

File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
    <source elided>
    List = [A,B,Counter]        
    return List
    ^

During: typing of assignment at <ipython-input-9-3c9e0fe02b75> (33)

File "<ipython-input-9-3c9e0fe02b75>", line 33:
def Calcs(Serial,Value,Ref,Base):
    <source elided>
    List = [A,B,Counter]        
    return List
    ^