在 Python Pandas 中，搜索连续 4 行值上升的位置

Question

我想弄清楚如何标记价格是 4 个涨价的一部分的行。 “is_consecutive”实际上是标记 .

我设法在行之间做了差异：

df['diff1'] = df['Close'].diff()

但是我没有找到哪一行是4次涨价的一部分。

我想使用 df.rolling() 。

范例df，

在第 0-3 行，我们需要在 ["is_consecutive"] 列上获得 'True' 的输出，因为此连续行上的 ['diff1'] 是增加 4 行 .

在第 8-11 行，我们需要在 ["is_consecutive"] 列上获得 'False' 的输出，因为此连续行上的 ['diff1'] 是零 .

   Date      Price           diff1    is_consecutive   
0  1/22/20    0               0          True
1  1/23/20    130            130         True
2  1/24/20    144            14          True
3  1/25/20    150            6           True
4  1/27/20    60            -90          False
5  1/28/20    95             35          False
6  1/29/20    100            5           False
7  1/30/20    50            -50          False
8  2/01/20    100            0           False
9  1/02/20    100            0           False
10  1/03/20   100            0           False
11  1/04/20   100            0           False
12  1/05/20   50            -50          False

一般示例：

如果价格 = [30,55,60,65,25]

列表中连续编号的不同形式为：

diff1 = [0,25,5,5,-40]

所以当 diff1 加上它实际上意味着连续价格上涨。

我需要标记（在 df 中）有 4 个连续上升的行。

感谢您的帮助 (-:

Answer 1

尝试：.rolling window 大小 4 和最短周期 1:

df["is_consecutive"] = (
    df["Price"]
    .rolling(4, min_periods=1)
    .apply(lambda x: (x.diff().fillna(0) >= 0).all())
    .astype(bool)
)
print(df)

打印：

      Date  Price  is_consecutive
0  1/22/20      0            True
1  1/23/20    130            True
2  1/24/20    144            True
3  1/25/20    150            True
4  1/26/20     60           False
5  1/26/20     95           False
6  1/26/20    100           False
7  1/26/20     50           False

Answer 2

假设数据框已排序。一种方法是根据差异的总和来确定 3 天上升趋势（即 4 天上升趋势）之后的第一次价格上涨。

quant1 = (df['Price'].diff().apply(np.sign) == 1).cumsum()
quant2 = (df['Price'].diff().apply(np.sign) == 1).cumsum().where(~(df['Price'].diff().apply(np.sign) == 1)).ffill().fillna(0).astype(int)
df['is_consecutive'] = (quant1-quant2) >= 3

请注意，以上仅考虑严格增加的价格（不等于）。

然后我们也覆盖 is_consecutive 标签，使用 win_view自定义函数：

def win_view(x, size):
    if isinstance(x, list):
        x = np.array(x)
    if isinstance(x, pd.core.series.Series):
        x = x.values
    if isinstance(x, np.ndarray):
        pass
    else:
        raise Exception('wrong type')
    return np.lib.stride_tricks.as_strided(
        x,
        shape=(x.size - size + 1, size),
        strides=(x.strides[0], x.strides[0])
    )


arr = win_view(df['is_consecutive'], 4)
arr[arr[:,3]] = True

请注意，我们将值替换为 True。

编辑 1 受自定义 win_view 函数的启发，我意识到可以通过 win_view （没有使用 cumsums 的需要如下：

df['is_consecutive'] = False
arr = win_view(df['Price'].diff(), 4)
arr_ind = win_view(list(df['Price'].index), 4)
mask = arr_ind[np.all(arr[:, 1:] > 0, axis=1)].flatten()
df.loc[mask, 'is_consecutive'] = True

我们维护 2 个数组，1 个用于 returns，1 个用于索引。我们收集有 3 个连续正数 return np.all(arr[:, 1:] > 0, axis=1 的指数（即 4 个上涨价格），并替换原始 df 中的那些。

Answer 3

该函数将 return 名为 "consecutive_up" 的列表示属于 5 increase 系列的所有行和 "consecutive_down" 表示属于 4 decrees 系列的所有行.

def c_func(temp_df):

     temp_df['increase'] = temp_df['Price'] > temp_df['Price'].shift()
     temp_df['decrease'] = temp_df['Price'] < temp_df['Price'].shift()

     temp_df['consecutive_up'] = False
     temp_df['consecutive_down'] = False

     for ind, row in temp_df.iterrows():
          if row['increase'] == True:
               count += 1
          else:
               count = 0
          if count == 5:
               temp_df.iloc[ind - 5:ind + 1, 4] = True
          elif count > 5:
               temp_df.iloc[ind, 4] = True

     for ind, row in temp_df.iterrows():
          if row['decrease'] == True:
               count += 1
          else:
               count = 0
          if count == 4:
               temp_df.iloc[ind - 4:ind + 1, 5] = True
          elif count > 4:
               temp_df.iloc[ind, 5] = True
     return temp_df

在 Python Pandas 中，搜索连续 4 行值上升的位置

In Python Pandas , searching where there are 4 consecutive rows where values going up

python

diff

matplotlib

dataframe

pandas