根据数值有条件地创建数据框列

Conditional creation of a dataframe column based on numeric values

我有一个 pandas 数据帧时间序列(大约 1000 行和下面的四列),如下所示:

Date          Values  Avg    +1 Stdev
01/01/2010    1.01    1.00   1.05
02/01/2010    1.02    1.00   1.05
03/01/2010    1.04    1.00   1.05
04/01/2010    -0.97   1.00   1.05
05/01/2010    1.12    1.00   1.05
06/01/2010    1.08    1.00   1.05
....

我想做的是创建第五列(称为 'Trigger Date'),如果第 2 列中的值超出第 4 列中设置的阈值,则新列 return s 日期(来自索引列),否则没有值是 returned。 这里的附加约束是,如果先前的值已经超过第 4 列中的阈值,则第五列也不应 return 日期。

换句话说,问题的伪代码是:

If df['Values'] > df['+1 Stdev']
AND
If df['Values'] (for the row above) < df['+1 Stdev']
THEN
Return df['Date'] in new column df['Trigger Date']
ELSE
Leave row in df['Trigger Date'] blank

如能提供解决此问题的任何帮助,我们将不胜感激

编辑:附加问题 - 添加第三个约束的任何方式,如果触发日期在过去 XX 天(例如过去 30 天)已经发生,则没有触发日期 returned?所以预期看起来像:

         Date  Values  Avg  +1 Stdev Trigger Date
0  01/01/2010    1.01  1.0      1.05          NaN
1  02/01/2010    1.02  1.0      1.05          NaN
2  03/01/2010    1.04  1.0      1.05          NaN
3  04/01/2010   -0.97  1.0      1.05          NaN
4  05/01/2010    1.12  1.0      1.05   05/01/2010
5  06/01/2010    1.08  1.0      1.05          NaN
6  07/01/2010    1.03  1.0      1.05          NaN
7  08/01/2010    1.07  1.0      1.05          NaN <- above threshold, but trigger occurred within last 30 days so don't return date
...
50 20/02/2010    1.12  1.0      1.05          20/02/2010 <- more than 30 days later, no trigger dates in between, so return date

对行上方的值使用 numpy.where with shift

m1 = df['Values'] > df['+1 Stdev']
m2 = df['Values'].shift() < df['+1 Stdev']

df['Trigger Date'] = np.where(m1 & m2, df['Date'], np.nan)
print (df)
         Date  Values  Avg  +1 Stdev Trigger Date
0  01/01/2010    1.01  1.0      1.05          NaN
1  02/01/2010    1.02  1.0      1.05          NaN
2  03/01/2010    1.04  1.0      1.05          NaN
3  04/01/2010   -0.97  1.0      1.05          NaN
4  05/01/2010    1.12  1.0      1.05   05/01/2010
5  06/01/2010    1.08  1.0      1.05          NaN

编辑:

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

m1 = df['Values'] > df['+1 Stdev']
m2 = df['Values'].shift() < df['+1 Stdev']
a = df['Date'] - pd.Timedelta(30, unit='d')
L = [df['Date'].shift(-1).isin(pd.date_range(x, y, freq='d')) for x, y in zip(a, df['Date'] )]
m3 = np.logical_or.reduce(L)

mask = (m1 & m2) | ~m3

df.loc[mask, 'Trigger Date'] = df['Date']
print (df)
        Date  Values  Avg  +1 Stdev Trigger Date
0 2010-01-01    1.01  1.0      1.05          NaT
1 2010-01-02    1.02  1.0      1.05          NaT
2 2010-01-03    1.04  1.0      1.05          NaT
3 2010-01-04   -0.97  1.0      1.05          NaT
4 2010-01-05    1.12  1.0      1.05   2010-01-05
5 2010-01-06    1.08  1.0      1.05          NaT
6 2010-02-20    1.12  1.0      1.05   2010-02-20