根据另一列中的多个条件填充 pandas 数据框中的一列
filling in a column in pandas dataframe based on multiple conditions in another column
我正在尝试填充数据框中的列(信号),条件是数据框中的另一列(差异)与 2 个变量进行比较。要填写的这一列有 3 种可能的结果,1、-1、0 代表买入、卖出、持有(回补)。这是到目前为止的代码和输出。
import numpy as np
import Quandl
tlm = Quandl.get("GOOG/NYSE_TLM", trim_start="2014-12-01", trim_end="2015-01-01")
tlm['diff'] = (tlm.Open - tlm.Close.shift(1))/tlm.Close.shift(1) # lags data
lowerbound = -0.08
upperbound = 0.08
tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0, 0.0)
tlm['signal'] = np.where(tlm['diff'] <= lowerbound, -1.0, 0.0)
print(tlm.head(20)) # is dataframe
Open High Low Close Volume diff signal
Date
2014-12-01 4.91 4.93 4.53 4.53 12999427 NaN 0
2014-12-02 4.62 4.82 4.47 4.64 8015450 0.019868 0
2014-12-03 4.51 4.83 4.48 4.63 9175510 -0.028017 0
2014-12-04 4.59 4.62 4.04 4.05 16065766 -0.008639 0
2014-12-05 4.05 4.09 3.86 3.94 8783581 0.000000 0
2014-12-08 3.88 4.04 3.46 3.74 17497626 -0.015228 0
2014-12-09 4.09 4.36 4.04 4.22 12559347 0.093583 0
2014-12-10 4.20 4.20 3.67 3.79 12403674 -0.004739 0
2014-12-11 3.74 3.95 3.67 3.69 9396960 -0.013193 0
2014-12-12 5.05 5.24 4.17 4.29 75949020 0.368564 0
2014-12-15 5.33 5.35 4.99 5.12 38834129 0.242424 0
2014-12-16 7.47 7.60 7.46 7.58 282795097 0.458984 0
2014-12-17 7.59 7.66 7.55 7.64 73152687 0.001319 0
2014-12-18 7.68 7.82 7.66 7.78 55387941 0.005236 0
2014-12-19 7.77 7.89 7.77 7.85 31330786 -0.001285 0
2014-12-22 7.82 7.85 7.78 7.79 22758351 -0.003822 0
2014-12-23 7.79 7.88 7.79 7.84 19068732 0.000000 0
2014-12-24 7.83 7.86 7.82 7.84 9174813 -0.001276 0
2014-12-26 7.84 7.86 7.82 7.85 9717732 0.000000 0
2014-12-29 7.84 7.86 7.81 7.83 12035787 -0.001274 0
上面代码的问题是打印覆盖前一行的前一行工作正常,你会在适当的信号列中看到 1s。所以我不得不为条件语句进入 for 循环,但我在循环中遇到了值错误。我有点理解与 Numpy 数组相关的布尔比较问题,但如果我不能比较条件,我将如何生成 3 个条件 (1、-1、0)?
for index, row in tlm.iterrows():
if tlm['diff'] >= upperbound: # value error here
tlm['signal'] = 1.0
if tlm['diff'] <= lowerbound:
tlm['signal'] = -1.0
else:
tlm['signal'] = 0.0
我是编码新手 pandas。提前致谢!
您可以使用 np.select:
conditions = [tlm['diff'] >= upperbound,
tlm['diff'] <= lowerbound]
choices = [1, -1]
tlm['signal'] = np.select(conditions, choices, default=0)
或者,等价地,但不那么可读:
tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0,
np.where(tlm['diff'] <= lowerbound, -1.0, 0.0))
我正在尝试填充数据框中的列(信号),条件是数据框中的另一列(差异)与 2 个变量进行比较。要填写的这一列有 3 种可能的结果,1、-1、0 代表买入、卖出、持有(回补)。这是到目前为止的代码和输出。
import numpy as np
import Quandl
tlm = Quandl.get("GOOG/NYSE_TLM", trim_start="2014-12-01", trim_end="2015-01-01")
tlm['diff'] = (tlm.Open - tlm.Close.shift(1))/tlm.Close.shift(1) # lags data
lowerbound = -0.08
upperbound = 0.08
tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0, 0.0)
tlm['signal'] = np.where(tlm['diff'] <= lowerbound, -1.0, 0.0)
print(tlm.head(20)) # is dataframe
Open High Low Close Volume diff signal
Date
2014-12-01 4.91 4.93 4.53 4.53 12999427 NaN 0
2014-12-02 4.62 4.82 4.47 4.64 8015450 0.019868 0
2014-12-03 4.51 4.83 4.48 4.63 9175510 -0.028017 0
2014-12-04 4.59 4.62 4.04 4.05 16065766 -0.008639 0
2014-12-05 4.05 4.09 3.86 3.94 8783581 0.000000 0
2014-12-08 3.88 4.04 3.46 3.74 17497626 -0.015228 0
2014-12-09 4.09 4.36 4.04 4.22 12559347 0.093583 0
2014-12-10 4.20 4.20 3.67 3.79 12403674 -0.004739 0
2014-12-11 3.74 3.95 3.67 3.69 9396960 -0.013193 0
2014-12-12 5.05 5.24 4.17 4.29 75949020 0.368564 0
2014-12-15 5.33 5.35 4.99 5.12 38834129 0.242424 0
2014-12-16 7.47 7.60 7.46 7.58 282795097 0.458984 0
2014-12-17 7.59 7.66 7.55 7.64 73152687 0.001319 0
2014-12-18 7.68 7.82 7.66 7.78 55387941 0.005236 0
2014-12-19 7.77 7.89 7.77 7.85 31330786 -0.001285 0
2014-12-22 7.82 7.85 7.78 7.79 22758351 -0.003822 0
2014-12-23 7.79 7.88 7.79 7.84 19068732 0.000000 0
2014-12-24 7.83 7.86 7.82 7.84 9174813 -0.001276 0
2014-12-26 7.84 7.86 7.82 7.85 9717732 0.000000 0
2014-12-29 7.84 7.86 7.81 7.83 12035787 -0.001274 0
上面代码的问题是打印覆盖前一行的前一行工作正常,你会在适当的信号列中看到 1s。所以我不得不为条件语句进入 for 循环,但我在循环中遇到了值错误。我有点理解与 Numpy 数组相关的布尔比较问题,但如果我不能比较条件,我将如何生成 3 个条件 (1、-1、0)?
for index, row in tlm.iterrows():
if tlm['diff'] >= upperbound: # value error here
tlm['signal'] = 1.0
if tlm['diff'] <= lowerbound:
tlm['signal'] = -1.0
else:
tlm['signal'] = 0.0
我是编码新手 pandas。提前致谢!
您可以使用 np.select:
conditions = [tlm['diff'] >= upperbound,
tlm['diff'] <= lowerbound]
choices = [1, -1]
tlm['signal'] = np.select(conditions, choices, default=0)
或者,等价地,但不那么可读:
tlm['signal'] = np.where(tlm['diff'] >= upperbound, 1.0,
np.where(tlm['diff'] <= lowerbound, -1.0, 0.0))