根据条件分配前 n 天的标签

Assign lables for previous n days based on condition

嗨,我对 calculating/labelling 前 11 天的日期(无论是否重复或丢失)有疑问。
当我在 'day'
列中找到“1”时,我需要在前 11 天(或动态 n 天)的“Day_mark”列中分配标签 下面是我的数据集,必填列是 'Day_mark'

例如:在第 18 行,我找到了“1”,需要分配前 11 天的标签。
在第 27 行中,我找到了“1”,需要分配前 11 天的标签,但我们没有 11 天,因此仅限于第 7 天

data = {'Date':['2021-10-4','2021-10-7','2021-10-9','2021-10-10','2021-10-11','2021-10-11','2021-10-12',
                '2021-10-12','2021-10-13','2021-10-14','2021-10-15','2021-10-15','2021-10-16','2021-10-16',
                '2021-10-17','2021-10-18','2021-10-19','2021-10-20','2021-10-21','2021-11-1','2021-11-2',
                '2021-11-3','2021-11-3','2021-11-3','2021-11-5','2021-11-6','2021-11-7','2021-11-8','2021-11-9'],
           'Hour':[ 9,11,12,13,5,7,2,20,21,23,1,2,5,7,15,16,17,1,12,13,5,7,2,20,21,23,16,17,13],
'Mark':[ '','','','','','','','','','','','','','','','','',1,'','','','','','','','',1,'',''],
'Day_mark':['','','d11','d10','d9','d9','d8','d8','d7','d6','d5','d5','d4','d4','d3','d2','d1','d7',
            'd6','d5','d4','d3','d3','d3','d2','d1',' ',' ',' ']
}

提前致谢

首先通过比较 1 通过移位掩码创建组列,通过 iloc 更改顺序并添加累积和然后通过 DataFrame.drop_duplicates and add counter by GroupBy.cumcount, then use GroupBy.ffill 删除重复项,如果两者之间不匹配则设置空字符串1,11 删除最后一行 1:

df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]

df['new'] = df.drop_duplicates('Date').groupby('g').cumcount(ascending=False).add(1)

s = df.groupby('g')['new'].ffill().fillna(0).astype(int)
df['new'] = np.where(df['g'].gt(0) & s.between(1,11), 'd' + s.astype(str), '')
df = df.drop('g', axis=1)

print (df)
          Date  Hour Mark Day_mark  new
0    2021-10-4     9                   
1    2021-10-7    11                   
2    2021-10-9    12           d11  d11
3   2021-10-10    13           d10  d10
4   2021-10-11     5            d9   d9
5   2021-10-11     7            d9   d9
6   2021-10-12     2            d8   d8
7   2021-10-12    20            d8   d8
8   2021-10-13    21            d7   d7
9   2021-10-14    23            d6   d6
10  2021-10-15     1            d5   d5
11  2021-10-15     2            d5   d5
12  2021-10-16     5            d4   d4
13  2021-10-16     7            d4   d4
14  2021-10-17    15            d3   d3
15  2021-10-18    16            d2   d2
16  2021-10-19    17            d1   d1
17  2021-10-20     1    1       d7   d7
18  2021-10-21    12            d6   d6
19   2021-11-1    13            d5   d5
20   2021-11-2     5            d4   d4
21   2021-11-3     7            d3   d3
22   2021-11-3     2            d3   d3
23   2021-11-3    20            d3   d3
24   2021-11-5    21            d2   d2
25   2021-11-6    23            d1   d1
26   2021-11-7    16    1              
27   2021-11-8    17                   
28   2021-11-9    13         

另一个 dea 是比较每个组的最后日期并减去,但输出不同:

df['Date'] = pd.to_datetime(df['Date'])

df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]

df['new'] = (df.groupby('g')['Date']
               .transform('last')
               .sub(df['Date'])
               .dt.days
               .add(1)
               .fillna(0)
               .astype(int))

df['new'] = np.where(df['g'].gt(0) & df['new'].le(11), 'd' + df['new'].astype(str), '')
df = df.drop('g', axis=1)

print (df)
         Date  Hour Mark Day_mark  new
0  2021-10-04     9                   
1  2021-10-07    11                   
2  2021-10-09    12           d11  d11
3  2021-10-10    13           d10  d10
4  2021-10-11     5            d9   d9
5  2021-10-11     7            d9   d9
6  2021-10-12     2            d8   d8
7  2021-10-12    20            d8   d8
8  2021-10-13    21            d7   d7
9  2021-10-14    23            d6   d6
10 2021-10-15     1            d5   d5
11 2021-10-15     2            d5   d5
12 2021-10-16     5            d4   d4
13 2021-10-16     7            d4   d4
14 2021-10-17    15            d3   d3
15 2021-10-18    16            d2   d2
16 2021-10-19    17            d1   d1
17 2021-10-20     1    1       d7     
18 2021-10-21    12            d6     
19 2021-11-01    13            d5   d6
20 2021-11-02     5            d4   d5
21 2021-11-03     7            d3   d4
22 2021-11-03     2            d3   d4
23 2021-11-03    20            d3   d4
24 2021-11-05    21            d2   d2
25 2021-11-06    23            d1   d1
26 2021-11-07    16    1              
27 2021-11-08    17                   
28 2021-11-09    13