遍历 df 并删除日期在给定日期点之前的所有行

Iterate over a df and delete all rows with date before a given date point

我有一个如下所示的 pd 数据框:

id projet_id date_cod date year month p50 p90
1 DCLT 30-03-2022 01-01-2022 2022 1 5313.79 4571.03
2 DLCT 01-02-2022 2022 2 2350.25 1880.70
3 DLCT 01-03-2022 2022 3 2450.25 1763.90
4 DLCT 01-01-2023 2023 1 2180.25 1280.70
5 DLCT 01-02-2023 2023 2 4871.03 5224.03
6 MADD 01-01-2023 01-01-2022 2022 1 4575.03 1280.70
7 MADD 01-02-2022 2022 2 4331.03 5718.03
8 MADD 01-03-2022 2022 3 4331.03 1235.75
9 MADD 01-04-2023 2023 4 1224.00 1280.70
10 MADD 01-05-2023 2023 5 1480.70 1330.70
11 PEYRS 01-03-2024 01-01-2024 2024 1 1280.70 1280.70
12 PEYRS 01-05-2024 2024 5 1200.70 1235.75

基于date_cod

对于每个 projet_id(DCLT、MADD、PEYRS),我想删除 p50p90 值在 date_cod.

输出 df 应如下所示。

id projet_id date_cod date year month p50 p90
1 DCLT 30-03-2022 01-01-2022 2022 1
2 DLCT 01-02-2022 2022 2
3 DLCT 01-03-2022 2022 3
4 DLCT 01-01-2023 2023 1 2180.25 1280.70
5 DLCT 01-02-2023 2023 2 4871.03 5224.03
6 MADD 01-01-2023 01-01-2022 2022 1
7 MADD 01-02-2022 2022 2
8 MADD 01-03-2022 2022 3
9 MADD 01-04-2023 2023 4 1224.00 1280.70
10 MADD 01-05-2023 2023 5 1480.70 1330.70
11 PEYRS 01-03-2024 01-01-2024 2024 1
12 PEYRS 01-05-2024 2024 5 1200.70 1235.75

像这样的东西应该可以工作:

#Make sure dates are in to_datetime
df['date_cod']=pd.to_datetime(df['date_cod'])
df['date']=pd.to_datetime(df['date'])

#Condition:date column is less (in total seconds) than first date for each projet_id's first date_cod value
cond=((df['date']-df.groupby('projet_id')['date_cod'].transform('first')).dt.total_seconds())<0

import numpy as np

#replace by '' where condition defined above is true
df['p50']=np.where(cond,'',df['p50'])
df['p90']=np.where(cond,'',df['p90'])
df