根据另一列中的多个条件填充 pandas 数据框中的一列

filling in a column in pandas dataframe based on multiple conditions in another column

我正在尝试填充数据框中的列(信号),条件是数据框中的另一列(差异)与 2 个变量进行比较。要填写的这一列有 3 种可能的结果,1、-1、0 代表买入、卖出、持有(回补)。这是到目前为止的代码和输出。

import numpy as np
import Quandl
tlm = Quandl.get("GOOG/NYSE_TLM", trim_start="2014-12-01", trim_end="2015-01-01")

tlm['diff'] = (tlm.Open - tlm.Close.shift(1))/tlm.Close.shift(1)  # lags data

lowerbound = -0.08
upperbound = 0.08

tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0, 0.0)
tlm['signal'] = np.where(tlm['diff'] <= lowerbound, -1.0, 0.0)

print(tlm.head(20))  # is dataframe

            Open  High   Low  Close     Volume      diff  signal
Date                                                            
2014-12-01  4.91  4.93  4.53   4.53   12999427       NaN       0
2014-12-02  4.62  4.82  4.47   4.64    8015450  0.019868       0
2014-12-03  4.51  4.83  4.48   4.63    9175510 -0.028017       0
2014-12-04  4.59  4.62  4.04   4.05   16065766 -0.008639       0
2014-12-05  4.05  4.09  3.86   3.94    8783581  0.000000       0
2014-12-08  3.88  4.04  3.46   3.74   17497626 -0.015228       0
2014-12-09  4.09  4.36  4.04   4.22   12559347  0.093583       0
2014-12-10  4.20  4.20  3.67   3.79   12403674 -0.004739       0
2014-12-11  3.74  3.95  3.67   3.69    9396960 -0.013193       0
2014-12-12  5.05  5.24  4.17   4.29   75949020  0.368564       0
2014-12-15  5.33  5.35  4.99   5.12   38834129  0.242424       0
2014-12-16  7.47  7.60  7.46   7.58  282795097  0.458984       0
2014-12-17  7.59  7.66  7.55   7.64   73152687  0.001319       0
2014-12-18  7.68  7.82  7.66   7.78   55387941  0.005236       0
2014-12-19  7.77  7.89  7.77   7.85   31330786 -0.001285       0
2014-12-22  7.82  7.85  7.78   7.79   22758351 -0.003822       0
2014-12-23  7.79  7.88  7.79   7.84   19068732  0.000000       0
2014-12-24  7.83  7.86  7.82   7.84    9174813 -0.001276       0
2014-12-26  7.84  7.86  7.82   7.85    9717732  0.000000       0
2014-12-29  7.84  7.86  7.81   7.83   12035787 -0.001274       0

上面代码的问题是打印覆盖前一行的前一行工作正常,你会在适当的信号列中看到 1s。所以我不得不为条件语句进入 for 循环,但我在循环中遇到了值错误。我有点理解与 Numpy 数组相关的布尔比较问题,但如果我不能比较条件,我将如何生成 3 个条件 (1、-1、0)?

for index, row in tlm.iterrows():
if tlm['diff'] >= upperbound:  # value error here
    tlm['signal'] = 1.0
    if tlm['diff'] <= lowerbound:
        tlm['signal'] = -1.0
    else:
        tlm['signal'] = 0.0

我是编码新手 pandas。提前致谢!

您可以使用 np.select:

conditions = [tlm['diff'] >= upperbound,
              tlm['diff'] <= lowerbound]
choices = [1, -1]

tlm['signal'] = np.select(conditions, choices, default=0)

或者,等价地,但不那么可读:

tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0, 
                         np.where(tlm['diff'] <= lowerbound, -1.0, 0.0))