当另一列中的5个连续行为1时,如何使标志列值为1
How to make flag column value 1 when 5 consective rows in other column are 1
我有一个包含两列 flag 和 flag1 的 datafrmae,我想检查 flag 列值是否大于 1 5 次或大于 5 次连续 flag1 值是否应更改为 1
here is example
df=pd.DataFrame({'flag':[0,0,1,1,1,1,1,1,1,0,0,0],'flag1':[0,0,0,0,0,0,1,0,0,0,0,0]})
解决方案的 2 个版本,对于 len(df) = 3300000 有慢有快
慢:
%%time
d = 1
for i,v in df.iterrows():
if (v.flag == 1) and (d<5) :
df.at[i,'flag1'] = 0
d+=1
elif (v.flag == 1):
df.at[i,'flag1'] = 1
d=1
else:
df.at[i,'flag1'] = 0
d=1
df['flag2']=df['flag1'].astype(int)
Wall time: 4min 27s
快:
%%time
from math import floor
d = 1
df['flag1'] = (
[(0,(d:=1))[0] if df.at[i,'flag']==0
else (0, (d := d+1))[0] if (d%5)!=0
else (1, (d :=1 ))[0]
for i in range(len(df))
] )
Wall time: 1min 1s
忽略“新”列。
flag
flag1
flag2
new
0
0
0
0
0
1
0
0
0
0
2
1
0
0
0
3
1
0
0
0
4
1
0
0
0
5
1
0
0
0
6
1
1
1
1
7
1
0
0
0
8
1
0
0
0
9
0
0
0
0
10
0
0
0
0
11
0
0
0
0
12
1
0
0
0
13
1
0
0
0
14
1
0
0
0
15
1
0
0
0
16
1
1
1
1
17
1
0
0
0
18
1
0
0
0
19
1
0
0
0
20
1
0
0
0
21
1
1
1
0
22
1
0
0
0
23
1
0
0
0
24
1
0
0
0
25
0
0
0
0
26
0
0
0
0
27
1
0
0
0
28
0
0
0
0
29
1
0
0
0
30
1
0
0
0
31
1
0
0
0
32
1
0
0
0
33
0
0
0
0
34
0
0
0
0
35
1
0
0
0
36
1
0
0
0
37
1
0
0
0
38
1
0
0
0
39
1
1
1
1
40
1
0
0
0
41
1
0
0
0
42
0
0
0
0
43
0
0
0
0
44
0
0
0
0
45
1
0
0
0
46
1
0
0
0
47
1
0
0
0
48
1
0
0
0
49
1
1
1
1
出于测试目的,我是这样生成数据的:
A = [0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1]
A = A * 100000
df=pd.DataFrame({'flag':A})
想法是创建连续计数,然后测试 5
是否相等:
a = df['flag'].eq(1)
#
b = a.cumsum()
df['new'] = b.sub(b.mask(a).ffill().fillna(0)).eq(5).astype(int)
print (df)
flag flag1 new
0 0 0 0
1 0 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 1 1
7 1 0 0
8 1 0 0
9 0 0 0
10 0 0 0
11 0 0 0
详情:
print (b.sub(b.mask(a).ffill().fillna(0)))
0 0.0
1 0.0
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 6.0
8 7.0
9 0.0
10 0.0
11 0.0
Name: flag, dtype: float64
设置
import pandas as pd
df=pd.DataFrame({'flag':[0,0,1,1,1,1,1,1,1,0,0,0],'flag1':[0,0,0,0,0,0,1,0,0,0,0,0]})
解决方案
rolling_sum = df["flag"].rolling(5).sum()
df["check"] = ((rolling_sum == 5) & (rolling_sum.diff() == 1)).astype(int)
flag flag1 check
0 0 0 0
1 0 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 1 1
7 1 0 0
8 1 0 0
9 0 0 0
10 0 0 0
11 0 0 0
我有一个包含两列 flag 和 flag1 的 datafrmae,我想检查 flag 列值是否大于 1 5 次或大于 5 次连续 flag1 值是否应更改为 1
here is example
df=pd.DataFrame({'flag':[0,0,1,1,1,1,1,1,1,0,0,0],'flag1':[0,0,0,0,0,0,1,0,0,0,0,0]})
解决方案的 2 个版本,对于 len(df) = 3300000 有慢有快
慢:
%%time
d = 1
for i,v in df.iterrows():
if (v.flag == 1) and (d<5) :
df.at[i,'flag1'] = 0
d+=1
elif (v.flag == 1):
df.at[i,'flag1'] = 1
d=1
else:
df.at[i,'flag1'] = 0
d=1
df['flag2']=df['flag1'].astype(int)
Wall time: 4min 27s
快:
%%time
from math import floor
d = 1
df['flag1'] = (
[(0,(d:=1))[0] if df.at[i,'flag']==0
else (0, (d := d+1))[0] if (d%5)!=0
else (1, (d :=1 ))[0]
for i in range(len(df))
] )
Wall time: 1min 1s
忽略“新”列。
flag | flag1 | flag2 | new | |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 0 |
5 | 1 | 0 | 0 | 0 |
6 | 1 | 1 | 1 | 1 |
7 | 1 | 0 | 0 | 0 |
8 | 1 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 |
10 | 0 | 0 | 0 | 0 |
11 | 0 | 0 | 0 | 0 |
12 | 1 | 0 | 0 | 0 |
13 | 1 | 0 | 0 | 0 |
14 | 1 | 0 | 0 | 0 |
15 | 1 | 0 | 0 | 0 |
16 | 1 | 1 | 1 | 1 |
17 | 1 | 0 | 0 | 0 |
18 | 1 | 0 | 0 | 0 |
19 | 1 | 0 | 0 | 0 |
20 | 1 | 0 | 0 | 0 |
21 | 1 | 1 | 1 | 0 |
22 | 1 | 0 | 0 | 0 |
23 | 1 | 0 | 0 | 0 |
24 | 1 | 0 | 0 | 0 |
25 | 0 | 0 | 0 | 0 |
26 | 0 | 0 | 0 | 0 |
27 | 1 | 0 | 0 | 0 |
28 | 0 | 0 | 0 | 0 |
29 | 1 | 0 | 0 | 0 |
30 | 1 | 0 | 0 | 0 |
31 | 1 | 0 | 0 | 0 |
32 | 1 | 0 | 0 | 0 |
33 | 0 | 0 | 0 | 0 |
34 | 0 | 0 | 0 | 0 |
35 | 1 | 0 | 0 | 0 |
36 | 1 | 0 | 0 | 0 |
37 | 1 | 0 | 0 | 0 |
38 | 1 | 0 | 0 | 0 |
39 | 1 | 1 | 1 | 1 |
40 | 1 | 0 | 0 | 0 |
41 | 1 | 0 | 0 | 0 |
42 | 0 | 0 | 0 | 0 |
43 | 0 | 0 | 0 | 0 |
44 | 0 | 0 | 0 | 0 |
45 | 1 | 0 | 0 | 0 |
46 | 1 | 0 | 0 | 0 |
47 | 1 | 0 | 0 | 0 |
48 | 1 | 0 | 0 | 0 |
49 | 1 | 1 | 1 | 1 |
出于测试目的,我是这样生成数据的:
A = [0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1]
A = A * 100000
df=pd.DataFrame({'flag':A})
想法是创建连续计数,然后测试 5
是否相等:
a = df['flag'].eq(1)
#
b = a.cumsum()
df['new'] = b.sub(b.mask(a).ffill().fillna(0)).eq(5).astype(int)
print (df)
flag flag1 new
0 0 0 0
1 0 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 1 1
7 1 0 0
8 1 0 0
9 0 0 0
10 0 0 0
11 0 0 0
详情:
print (b.sub(b.mask(a).ffill().fillna(0)))
0 0.0
1 0.0
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 6.0
8 7.0
9 0.0
10 0.0
11 0.0
Name: flag, dtype: float64
设置
import pandas as pd
df=pd.DataFrame({'flag':[0,0,1,1,1,1,1,1,1,0,0,0],'flag1':[0,0,0,0,0,0,1,0,0,0,0,0]})
解决方案
rolling_sum = df["flag"].rolling(5).sum()
df["check"] = ((rolling_sum == 5) & (rolling_sum.diff() == 1)).astype(int)
flag flag1 check
0 0 0 0
1 0 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 1 1
7 1 0 0
8 1 0 0
9 0 0 0
10 0 0 0
11 0 0 0