根据数组中的信号执行计算

Perform calculations based on signals in array

我有两列 - 数组中的 'close' 列和 'signals' 列。我想根据 'signals' 列中的分类数据对 'close' 列中的数据执行计算。如果相同的信号连续出现(忽略NAN)则什么也不做,只在n+t索引处的'signals'数据与前面索引n处的'signals'数据相反时才执行计算。

这是一个基本的回测代码,用于证明我逻辑上提出的算法的能力。我知道可能需要 for 循环才能正确应用,但我不确定在尝试应用到数据的特定索引点时如何正确应用。

伪代码

for n in signals:
    if signals == 1: 
        if 'signals' n+t == 1 maintain 'close' at n index point:
        when 'signals' n+t == 2
            return ['close'(n+t) - 'close'(n)] in 'calculations' at index n+t

这是我希望通过编程方法获得的输出。

   close  signals  calculations
0  100    NAN      NAN
1  105    1        NAN
2  110    NAN      NAN
3  107    1        NAN
4  115    NAN      NAN
5  120    2        15

感谢您的帮助,如果需要任何说明,请告诉我!

一种方式可能是:

  1. 使用 dropna
  2. 提取 "signals" 不为空的行
  3. 使用 shift
  4. 删除连续的重复项
  5. 设置输出列:如果信号 = 2,设置 close 差异,否则:设置 NaN。我使用 np.where()
  6. 使用 join
  7. 将此列更新为输入数据框

这里是代码:

# Import modules
import pandas as pd
import numpy as np

# Build dataset
data = [[10,  np.NaN,  ],
        [105, 1,       ],
        [110, np.NaN,  ],
        [107, 1,       ],
        [115, np.NaN,  ],
        [120, 2,       ]]
df = pd.DataFrame(data, columns=["close", "signals"])


# Select rows where "signals" not null and remove duplicates
sub_df = df.dropna(subset=['signals'])

# Remove consecutive duplicates
sub_df = sub_df.loc[sub_df.signals.shift() != sub_df.signals]

# If signal == 2, set diff between close and previous close
# Else: set NaN
sub_df['output'] = np.where(sub_df.signals == 2, sub_df.close - sub_df.close.shift(), np.NaN)
print(sub_df)
#    close  signals  output
# 1    105      1.0     NaN
# 5    120      2.0    15.0

# Update dataframe with the new column
print(df.join(sub_df['output']))
#    close  signals  output
# 0     10      NaN     NaN
# 1    105      1.0     NaN
# 2    110      NaN     NaN
# 3    107      1.0     NaN
# 4    115      NaN     NaN
# 5    120      2.0    15.0