Np.where 基于下一行中的值
Np.where based on the values in the next rows
我有如下示例数据框
我只想标记 id == 11 前面有 2 的所有行。
如果 id ==11 并且在紧邻的行中没有先于 2 我想将其标记为 0
Home" Date Time id Appliance Banana expected_output output_from the code
1 1/21/2017 1:30:00 11 Apple 0 1 1
2 1/21/2017 1:45:00 11 Apple 0 1 1
3 1/21/2017 2:00:00 11 Apple 0 1 1
4 1/21/2017 2:15:00 2 Banana 1 1 0
5 1/21/2017 2:30:00 2 Banana 0 0 0
6 1/21/2017 2:45:00 0 Orange 0 0 0
7 1/21/2017 3:00:00 1 Peach 0 0 0
8 1/21/2017 3:15:00 1 Peach 0 0 0
9 1/21/2017 3:30:00 3 Pineapple 0 0 0
10 1/21/2017 3:45:00 3 Pineapple 0 0 0
11 1/21/2017 4:00:00 11 Apple 0 0 1
12 1/21/2017 4:15:00 11 Apple 0 0 1
13 1/21/2017 4:30:00 11 Apple 0 0 1
14 1/21/2017 4:45:00 0 Orange 0 0 0
15 1/22/2017 3:30:00 1 Peach 0 0 0
16 1/22/2017 3:45:00 1 Peach 0 0 0
17 1/22/2017 4:00:00 3 Pineapple 0 0 0
18 1/22/2017 4:15:00 3 Pineapple 0 0 0
19 1/22/2017 4:30:00 11 Apple 0 1 1
20 1/22/2017 4:45:00 11 Apple 0 1 1
21 1/22/2017 5:00:00 11 Apple 0 1 1
22 1/22/2017 5:15:00 2 Banana 1 1 0
23 1/22/2017 5:30:00 2 Banana 1 0 0
到目前为止我取得了什么
df['Banana'] = np.where((df['id']==2) & (df['id'].shift(+1)==11), 1,
0)
formatted_df['output_from the code'] = np.where((df['id']==11) & (df['id'].shift(-1)==2), 1,
np.where((df['id']==11) & (df['id'].shift(-1)==11), 1,
0))
有没有办法根据上一行np.where写
我维护了一个名为 haggis. It has two functions that convert an array of booleans to an array of run markers and vice versa called haggis.math.mask2runs
and haggis.math.runs2mask
的实用程序库。这些函数互为反函数。 mask2runs
是这样工作的:
>>> mask2runs([0, 1, 0, 1, 1, 1, 0, 1, 1])
array([[1, 2],
[3, 6],
[7, 9]], dtype=int64)
第一列是每个运行个True元素的起始索引,第二列是排他性结束。你可以直接使用这个:
# Grab locations of 11s
runs = mask2runs(df.id == 11)
# remove the ones that aren't followed by 2
runs = runs[df.id.iloc[np.clip(runs[:, 1], 0, len(df) - 1)] == 2]
# Convert remaining runs back to mask
df.output = runs2mask(runs, n=len(df))
您无需导入哈吉斯即可使用这些功能。它们实际上非常简单,尤其是在这个有限的用例中。来源是here,不过也可以专门针对这个问题总结如下:
def mask2runs(mask):
return np.flatnonzero(np.diff(np.r_[np.int8(0), mask.view(np.int8) np.int8(0)])).reshape(-1, 2)
def runs2mask(runs, n):
mask = np.zeros(n, dtype=bool)
view = mask.view(np.int8)
view[runs[:, 0]] = 1
view[runs[:-1, 1] if runs[-1, 1] == n else runs[:, 1]] = -1
np.cumsum(view, out=view)
return mask
我有如下示例数据框
我只想标记 id == 11 前面有 2 的所有行。 如果 id ==11 并且在紧邻的行中没有先于 2 我想将其标记为 0
Home" Date Time id Appliance Banana expected_output output_from the code
1 1/21/2017 1:30:00 11 Apple 0 1 1
2 1/21/2017 1:45:00 11 Apple 0 1 1
3 1/21/2017 2:00:00 11 Apple 0 1 1
4 1/21/2017 2:15:00 2 Banana 1 1 0
5 1/21/2017 2:30:00 2 Banana 0 0 0
6 1/21/2017 2:45:00 0 Orange 0 0 0
7 1/21/2017 3:00:00 1 Peach 0 0 0
8 1/21/2017 3:15:00 1 Peach 0 0 0
9 1/21/2017 3:30:00 3 Pineapple 0 0 0
10 1/21/2017 3:45:00 3 Pineapple 0 0 0
11 1/21/2017 4:00:00 11 Apple 0 0 1
12 1/21/2017 4:15:00 11 Apple 0 0 1
13 1/21/2017 4:30:00 11 Apple 0 0 1
14 1/21/2017 4:45:00 0 Orange 0 0 0
15 1/22/2017 3:30:00 1 Peach 0 0 0
16 1/22/2017 3:45:00 1 Peach 0 0 0
17 1/22/2017 4:00:00 3 Pineapple 0 0 0
18 1/22/2017 4:15:00 3 Pineapple 0 0 0
19 1/22/2017 4:30:00 11 Apple 0 1 1
20 1/22/2017 4:45:00 11 Apple 0 1 1
21 1/22/2017 5:00:00 11 Apple 0 1 1
22 1/22/2017 5:15:00 2 Banana 1 1 0
23 1/22/2017 5:30:00 2 Banana 1 0 0
到目前为止我取得了什么
df['Banana'] = np.where((df['id']==2) & (df['id'].shift(+1)==11), 1,
0)
formatted_df['output_from the code'] = np.where((df['id']==11) & (df['id'].shift(-1)==2), 1,
np.where((df['id']==11) & (df['id'].shift(-1)==11), 1,
0))
有没有办法根据上一行np.where写
我维护了一个名为 haggis. It has two functions that convert an array of booleans to an array of run markers and vice versa called haggis.math.mask2runs
and haggis.math.runs2mask
的实用程序库。这些函数互为反函数。 mask2runs
是这样工作的:
>>> mask2runs([0, 1, 0, 1, 1, 1, 0, 1, 1])
array([[1, 2],
[3, 6],
[7, 9]], dtype=int64)
第一列是每个运行个True元素的起始索引,第二列是排他性结束。你可以直接使用这个:
# Grab locations of 11s
runs = mask2runs(df.id == 11)
# remove the ones that aren't followed by 2
runs = runs[df.id.iloc[np.clip(runs[:, 1], 0, len(df) - 1)] == 2]
# Convert remaining runs back to mask
df.output = runs2mask(runs, n=len(df))
您无需导入哈吉斯即可使用这些功能。它们实际上非常简单,尤其是在这个有限的用例中。来源是here,不过也可以专门针对这个问题总结如下:
def mask2runs(mask):
return np.flatnonzero(np.diff(np.r_[np.int8(0), mask.view(np.int8) np.int8(0)])).reshape(-1, 2)
def runs2mask(runs, n):
mask = np.zeros(n, dtype=bool)
view = mask.view(np.int8)
view[runs[:, 0]] = 1
view[runs[:-1, 1] if runs[-1, 1] == n else runs[:, 1]] = -1
np.cumsum(view, out=view)
return mask