python/pandas 时间序列:快速 attack/slow 衰减;带衰减的峰值检测
python/pandas time series: fast attack/slow decay; peak detection with decay
我想在时间序列 ts(pandas 数据帧中的一列)上实现“快速攻击/慢速衰减”(峰值检测和指数衰减)过滤器,描述如下:
fasd[t] = max(ts[t], 0.9 * fasd[t-1])
“基本”代码(如下)有效,但是是否有使用 rolling() 或向量化方法的 pythonic 和有效的方法来做到这一点?谢谢
import pandas as pd
ts = [1,0,0,0,0,1,0,0,0,1,0.95,1,1,1,1,1,0,0,1,1,1,1,1,1,]
df = pd.DataFrame({'ts':ts})
df['fasd'] = 0
df.loc[0,'fasd'] = df.iloc[0]['ts']
for i in range(1, len(df)):
df.loc[i, 'fasd'] = max(df.loc[i,'ts'], 0.9*df.loc[i-1, 'fasd'])
使用numpy效率更高:
from time import time
import pandas as pd
ts = [1,0,0,0,0,1,0,0,0,1,0.95,1,1,1,1,1,0,0,1,1,1,1,1,1] * 1000 # artificially increasing the input size
df = pd.DataFrame({'ts':ts})
df['fasd'] = 0
df.loc[0,'fasd'] = df.iloc[0]['ts']
df2 = df.copy()
t0 = time()
for i in range(1, len(df)):
df.loc[i, 'fasd'] = max(df.loc[i,'ts'], 0.9*df.loc[i-1, 'fasd'])
t1 = time()
print(f'Pandas version executed in {t1-t0} sec.')
def fasd(array):
for i in range(1, len(array)):
array[i,1] = max(array[i,0], 0.9*array[i-1,1])
return array
t0 = time()
df2 = pd.DataFrame(fasd(df2.to_numpy()))
t1 = time()
print(f'Numpy version executed in {t1-t0} sec.')
输出:
Pandas version executed in 3.0636708736419678 sec.
Numpy version executed in 0.011569976806640625 sec.
我想在时间序列 ts(pandas 数据帧中的一列)上实现“快速攻击/慢速衰减”(峰值检测和指数衰减)过滤器,描述如下:
fasd[t] = max(ts[t], 0.9 * fasd[t-1])
“基本”代码(如下)有效,但是是否有使用 rolling() 或向量化方法的 pythonic 和有效的方法来做到这一点?谢谢
import pandas as pd
ts = [1,0,0,0,0,1,0,0,0,1,0.95,1,1,1,1,1,0,0,1,1,1,1,1,1,]
df = pd.DataFrame({'ts':ts})
df['fasd'] = 0
df.loc[0,'fasd'] = df.iloc[0]['ts']
for i in range(1, len(df)):
df.loc[i, 'fasd'] = max(df.loc[i,'ts'], 0.9*df.loc[i-1, 'fasd'])
使用numpy效率更高:
from time import time
import pandas as pd
ts = [1,0,0,0,0,1,0,0,0,1,0.95,1,1,1,1,1,0,0,1,1,1,1,1,1] * 1000 # artificially increasing the input size
df = pd.DataFrame({'ts':ts})
df['fasd'] = 0
df.loc[0,'fasd'] = df.iloc[0]['ts']
df2 = df.copy()
t0 = time()
for i in range(1, len(df)):
df.loc[i, 'fasd'] = max(df.loc[i,'ts'], 0.9*df.loc[i-1, 'fasd'])
t1 = time()
print(f'Pandas version executed in {t1-t0} sec.')
def fasd(array):
for i in range(1, len(array)):
array[i,1] = max(array[i,0], 0.9*array[i-1,1])
return array
t0 = time()
df2 = pd.DataFrame(fasd(df2.to_numpy()))
t1 = time()
print(f'Numpy version executed in {t1-t0} sec.')
输出:
Pandas version executed in 3.0636708736419678 sec.
Numpy version executed in 0.011569976806640625 sec.