Np.where 基于下一行中的值

Np.where based on the values in the next rows

我有如下示例数据框

我只想标记 id == 11 前面有 2 的所有行。 如果 id ==11 并且在紧邻的行中没有先于 2 我想将其标记为 0

Home"   Date       Time      id  Appliance   Banana expected_output output_from the code
1      1/21/2017    1:30:00  11   Apple       0       1                 1
2      1/21/2017    1:45:00  11  Apple        0       1                 1
3      1/21/2017    2:00:00  11  Apple        0       1                 1
4      1/21/2017    2:15:00  2   Banana       1       1                 0
5      1/21/2017    2:30:00  2   Banana       0       0                 0
6      1/21/2017    2:45:00  0   Orange       0       0                 0
7      1/21/2017    3:00:00  1   Peach        0       0                 0
8      1/21/2017    3:15:00  1   Peach        0       0                 0
9      1/21/2017    3:30:00  3   Pineapple    0       0                 0
10     1/21/2017    3:45:00  3   Pineapple    0       0                 0
11     1/21/2017    4:00:00  11  Apple        0       0                 1
12     1/21/2017    4:15:00  11  Apple        0       0                 1
13     1/21/2017    4:30:00  11  Apple        0       0                 1
14     1/21/2017    4:45:00  0   Orange       0       0                 0
15     1/22/2017    3:30:00  1   Peach        0       0                 0
16     1/22/2017    3:45:00  1   Peach        0       0                 0
17     1/22/2017    4:00:00  3   Pineapple    0       0                 0
18     1/22/2017    4:15:00  3   Pineapple    0       0                 0
19     1/22/2017    4:30:00  11  Apple        0       1                 1
20     1/22/2017    4:45:00  11  Apple        0       1                 1
21     1/22/2017    5:00:00  11  Apple        0       1                 1
22     1/22/2017    5:15:00  2   Banana       1       1                 0
23     1/22/2017    5:30:00  2   Banana       1       0                 0

到目前为止我取得了什么

df['Banana'] = np.where((df['id']==2) & (df['id'].shift(+1)==11), 1,
                                   0)

formatted_df['output_from the code'] = np.where((df['id']==11) & (df['id'].shift(-1)==2), 1,
                                    np.where((df['id']==11) & (df['id'].shift(-1)==11), 1,
                                   0))

有没有办法根据上一行np.where写

我维护了一个名为 haggis. It has two functions that convert an array of booleans to an array of run markers and vice versa called haggis.math.mask2runs and haggis.math.runs2mask 的实用程序库。这些函数互为反函数。 mask2runs 是这样工作的:

>>> mask2runs([0, 1, 0, 1, 1, 1, 0, 1, 1])
array([[1, 2],
       [3, 6],
       [7, 9]], dtype=int64)

第一列是每个运行个True元素的起始索引,第二列是排他性结束。你可以直接使用这个:

# Grab locations of 11s
runs = mask2runs(df.id == 11)
# remove the ones that aren't followed by 2
runs = runs[df.id.iloc[np.clip(runs[:, 1], 0, len(df) - 1)] == 2]
# Convert remaining runs back to mask
df.output = runs2mask(runs, n=len(df))

您无需导入哈吉斯即可使用这些功能。它们实际上非常简单,尤其是在这个有限的用例中。来源是here,不过也可以专门针对这个问题总结如下:

def mask2runs(mask):
    return np.flatnonzero(np.diff(np.r_[np.int8(0), mask.view(np.int8) np.int8(0)])).reshape(-1, 2)

def runs2mask(runs, n):
    mask = np.zeros(n, dtype=bool)
    view = mask.view(np.int8)
    view[runs[:, 0]] = 1
    view[runs[:-1, 1] if runs[-1, 1] == n else runs[:, 1]] = -1
    np.cumsum(view, out=view)
    return mask