根据条件分配前 n 天的标签
Assign lables for previous n days based on condition
嗨,我对 calculating/labelling 前 11 天的日期(无论是否重复或丢失)有疑问。
当我在 'day'
列中找到“1”时,我需要在前 11 天(或动态 n 天)的“Day_mark”列中分配标签
下面是我的数据集,必填列是 'Day_mark'
例如:在第 18 行,我找到了“1”,需要分配前 11 天的标签。
在第 27 行中,我找到了“1”,需要分配前 11 天的标签,但我们没有 11 天,因此仅限于第 7 天
data = {'Date':['2021-10-4','2021-10-7','2021-10-9','2021-10-10','2021-10-11','2021-10-11','2021-10-12',
'2021-10-12','2021-10-13','2021-10-14','2021-10-15','2021-10-15','2021-10-16','2021-10-16',
'2021-10-17','2021-10-18','2021-10-19','2021-10-20','2021-10-21','2021-11-1','2021-11-2',
'2021-11-3','2021-11-3','2021-11-3','2021-11-5','2021-11-6','2021-11-7','2021-11-8','2021-11-9'],
'Hour':[ 9,11,12,13,5,7,2,20,21,23,1,2,5,7,15,16,17,1,12,13,5,7,2,20,21,23,16,17,13],
'Mark':[ '','','','','','','','','','','','','','','','','',1,'','','','','','','','',1,'',''],
'Day_mark':['','','d11','d10','d9','d9','d8','d8','d7','d6','d5','d5','d4','d4','d3','d2','d1','d7',
'd6','d5','d4','d3','d3','d3','d2','d1',' ',' ',' ']
}
提前致谢
首先通过比较 1
通过移位掩码创建组列,通过 iloc
更改顺序并添加累积和然后通过 DataFrame.drop_duplicates
and add counter by GroupBy.cumcount
, then use GroupBy.ffill
删除重复项,如果两者之间不匹配则设置空字符串1,11
删除最后一行 1
:
df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]
df['new'] = df.drop_duplicates('Date').groupby('g').cumcount(ascending=False).add(1)
s = df.groupby('g')['new'].ffill().fillna(0).astype(int)
df['new'] = np.where(df['g'].gt(0) & s.between(1,11), 'd' + s.astype(str), '')
df = df.drop('g', axis=1)
print (df)
Date Hour Mark Day_mark new
0 2021-10-4 9
1 2021-10-7 11
2 2021-10-9 12 d11 d11
3 2021-10-10 13 d10 d10
4 2021-10-11 5 d9 d9
5 2021-10-11 7 d9 d9
6 2021-10-12 2 d8 d8
7 2021-10-12 20 d8 d8
8 2021-10-13 21 d7 d7
9 2021-10-14 23 d6 d6
10 2021-10-15 1 d5 d5
11 2021-10-15 2 d5 d5
12 2021-10-16 5 d4 d4
13 2021-10-16 7 d4 d4
14 2021-10-17 15 d3 d3
15 2021-10-18 16 d2 d2
16 2021-10-19 17 d1 d1
17 2021-10-20 1 1 d7 d7
18 2021-10-21 12 d6 d6
19 2021-11-1 13 d5 d5
20 2021-11-2 5 d4 d4
21 2021-11-3 7 d3 d3
22 2021-11-3 2 d3 d3
23 2021-11-3 20 d3 d3
24 2021-11-5 21 d2 d2
25 2021-11-6 23 d1 d1
26 2021-11-7 16 1
27 2021-11-8 17
28 2021-11-9 13
另一个 dea 是比较每个组的最后日期并减去,但输出不同:
df['Date'] = pd.to_datetime(df['Date'])
df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]
df['new'] = (df.groupby('g')['Date']
.transform('last')
.sub(df['Date'])
.dt.days
.add(1)
.fillna(0)
.astype(int))
df['new'] = np.where(df['g'].gt(0) & df['new'].le(11), 'd' + df['new'].astype(str), '')
df = df.drop('g', axis=1)
print (df)
Date Hour Mark Day_mark new
0 2021-10-04 9
1 2021-10-07 11
2 2021-10-09 12 d11 d11
3 2021-10-10 13 d10 d10
4 2021-10-11 5 d9 d9
5 2021-10-11 7 d9 d9
6 2021-10-12 2 d8 d8
7 2021-10-12 20 d8 d8
8 2021-10-13 21 d7 d7
9 2021-10-14 23 d6 d6
10 2021-10-15 1 d5 d5
11 2021-10-15 2 d5 d5
12 2021-10-16 5 d4 d4
13 2021-10-16 7 d4 d4
14 2021-10-17 15 d3 d3
15 2021-10-18 16 d2 d2
16 2021-10-19 17 d1 d1
17 2021-10-20 1 1 d7
18 2021-10-21 12 d6
19 2021-11-01 13 d5 d6
20 2021-11-02 5 d4 d5
21 2021-11-03 7 d3 d4
22 2021-11-03 2 d3 d4
23 2021-11-03 20 d3 d4
24 2021-11-05 21 d2 d2
25 2021-11-06 23 d1 d1
26 2021-11-07 16 1
27 2021-11-08 17
28 2021-11-09 13
嗨,我对 calculating/labelling 前 11 天的日期(无论是否重复或丢失)有疑问。
当我在 'day'
列中找到“1”时,我需要在前 11 天(或动态 n 天)的“Day_mark”列中分配标签
下面是我的数据集,必填列是 'Day_mark'
例如:在第 18 行,我找到了“1”,需要分配前 11 天的标签。
在第 27 行中,我找到了“1”,需要分配前 11 天的标签,但我们没有 11 天,因此仅限于第 7 天
data = {'Date':['2021-10-4','2021-10-7','2021-10-9','2021-10-10','2021-10-11','2021-10-11','2021-10-12',
'2021-10-12','2021-10-13','2021-10-14','2021-10-15','2021-10-15','2021-10-16','2021-10-16',
'2021-10-17','2021-10-18','2021-10-19','2021-10-20','2021-10-21','2021-11-1','2021-11-2',
'2021-11-3','2021-11-3','2021-11-3','2021-11-5','2021-11-6','2021-11-7','2021-11-8','2021-11-9'],
'Hour':[ 9,11,12,13,5,7,2,20,21,23,1,2,5,7,15,16,17,1,12,13,5,7,2,20,21,23,16,17,13],
'Mark':[ '','','','','','','','','','','','','','','','','',1,'','','','','','','','',1,'',''],
'Day_mark':['','','d11','d10','d9','d9','d8','d8','d7','d6','d5','d5','d4','d4','d3','d2','d1','d7',
'd6','d5','d4','d3','d3','d3','d2','d1',' ',' ',' ']
}
提前致谢
首先通过比较 1
通过移位掩码创建组列,通过 iloc
更改顺序并添加累积和然后通过 DataFrame.drop_duplicates
and add counter by GroupBy.cumcount
, then use GroupBy.ffill
删除重复项,如果两者之间不匹配则设置空字符串1,11
删除最后一行 1
:
df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]
df['new'] = df.drop_duplicates('Date').groupby('g').cumcount(ascending=False).add(1)
s = df.groupby('g')['new'].ffill().fillna(0).astype(int)
df['new'] = np.where(df['g'].gt(0) & s.between(1,11), 'd' + s.astype(str), '')
df = df.drop('g', axis=1)
print (df)
Date Hour Mark Day_mark new
0 2021-10-4 9
1 2021-10-7 11
2 2021-10-9 12 d11 d11
3 2021-10-10 13 d10 d10
4 2021-10-11 5 d9 d9
5 2021-10-11 7 d9 d9
6 2021-10-12 2 d8 d8
7 2021-10-12 20 d8 d8
8 2021-10-13 21 d7 d7
9 2021-10-14 23 d6 d6
10 2021-10-15 1 d5 d5
11 2021-10-15 2 d5 d5
12 2021-10-16 5 d4 d4
13 2021-10-16 7 d4 d4
14 2021-10-17 15 d3 d3
15 2021-10-18 16 d2 d2
16 2021-10-19 17 d1 d1
17 2021-10-20 1 1 d7 d7
18 2021-10-21 12 d6 d6
19 2021-11-1 13 d5 d5
20 2021-11-2 5 d4 d4
21 2021-11-3 7 d3 d3
22 2021-11-3 2 d3 d3
23 2021-11-3 20 d3 d3
24 2021-11-5 21 d2 d2
25 2021-11-6 23 d1 d1
26 2021-11-7 16 1
27 2021-11-8 17
28 2021-11-9 13
另一个 dea 是比较每个组的最后日期并减去,但输出不同:
df['Date'] = pd.to_datetime(df['Date'])
df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]
df['new'] = (df.groupby('g')['Date']
.transform('last')
.sub(df['Date'])
.dt.days
.add(1)
.fillna(0)
.astype(int))
df['new'] = np.where(df['g'].gt(0) & df['new'].le(11), 'd' + df['new'].astype(str), '')
df = df.drop('g', axis=1)
print (df)
Date Hour Mark Day_mark new
0 2021-10-04 9
1 2021-10-07 11
2 2021-10-09 12 d11 d11
3 2021-10-10 13 d10 d10
4 2021-10-11 5 d9 d9
5 2021-10-11 7 d9 d9
6 2021-10-12 2 d8 d8
7 2021-10-12 20 d8 d8
8 2021-10-13 21 d7 d7
9 2021-10-14 23 d6 d6
10 2021-10-15 1 d5 d5
11 2021-10-15 2 d5 d5
12 2021-10-16 5 d4 d4
13 2021-10-16 7 d4 d4
14 2021-10-17 15 d3 d3
15 2021-10-18 16 d2 d2
16 2021-10-19 17 d1 d1
17 2021-10-20 1 1 d7
18 2021-10-21 12 d6
19 2021-11-01 13 d5 d6
20 2021-11-02 5 d4 d5
21 2021-11-03 7 d3 d4
22 2021-11-03 2 d3 d4
23 2021-11-03 20 d3 d4
24 2021-11-05 21 d2 d2
25 2021-11-06 23 d1 d1
26 2021-11-07 16 1
27 2021-11-08 17
28 2021-11-09 13