遍历 df 并删除日期在给定日期点之前的所有行
Iterate over a df and delete all rows with date before a given date point
我有一个如下所示的 pd 数据框:
id
projet_id
date_cod
date
year
month
p50
p90
1
DCLT
30-03-2022
01-01-2022
2022
1
5313.79
4571.03
2
DLCT
01-02-2022
2022
2
2350.25
1880.70
3
DLCT
01-03-2022
2022
3
2450.25
1763.90
4
DLCT
01-01-2023
2023
1
2180.25
1280.70
5
DLCT
01-02-2023
2023
2
4871.03
5224.03
6
MADD
01-01-2023
01-01-2022
2022
1
4575.03
1280.70
7
MADD
01-02-2022
2022
2
4331.03
5718.03
8
MADD
01-03-2022
2022
3
4331.03
1235.75
9
MADD
01-04-2023
2023
4
1224.00
1280.70
10
MADD
01-05-2023
2023
5
1480.70
1330.70
11
PEYRS
01-03-2024
01-01-2024
2024
1
1280.70
1280.70
12
PEYRS
01-05-2024
2024
5
1200.70
1235.75
基于date_cod
、
对于每个 projet_id
(DCLT、MADD、PEYRS),我想删除 p50
和 p90
值在 date_cod
.
输出 df 应如下所示。
id
projet_id
date_cod
date
year
month
p50
p90
1
DCLT
30-03-2022
01-01-2022
2022
1
2
DLCT
01-02-2022
2022
2
3
DLCT
01-03-2022
2022
3
4
DLCT
01-01-2023
2023
1
2180.25
1280.70
5
DLCT
01-02-2023
2023
2
4871.03
5224.03
6
MADD
01-01-2023
01-01-2022
2022
1
7
MADD
01-02-2022
2022
2
8
MADD
01-03-2022
2022
3
9
MADD
01-04-2023
2023
4
1224.00
1280.70
10
MADD
01-05-2023
2023
5
1480.70
1330.70
11
PEYRS
01-03-2024
01-01-2024
2024
1
12
PEYRS
01-05-2024
2024
5
1200.70
1235.75
像这样的东西应该可以工作:
#Make sure dates are in to_datetime
df['date_cod']=pd.to_datetime(df['date_cod'])
df['date']=pd.to_datetime(df['date'])
#Condition:date column is less (in total seconds) than first date for each projet_id's first date_cod value
cond=((df['date']-df.groupby('projet_id')['date_cod'].transform('first')).dt.total_seconds())<0
import numpy as np
#replace by '' where condition defined above is true
df['p50']=np.where(cond,'',df['p50'])
df['p90']=np.where(cond,'',df['p90'])
df
我有一个如下所示的 pd 数据框:
id | projet_id | date_cod | date | year | month | p50 | p90 |
---|---|---|---|---|---|---|---|
1 | DCLT | 30-03-2022 | 01-01-2022 | 2022 | 1 | 5313.79 | 4571.03 |
2 | DLCT | 01-02-2022 | 2022 | 2 | 2350.25 | 1880.70 | |
3 | DLCT | 01-03-2022 | 2022 | 3 | 2450.25 | 1763.90 | |
4 | DLCT | 01-01-2023 | 2023 | 1 | 2180.25 | 1280.70 | |
5 | DLCT | 01-02-2023 | 2023 | 2 | 4871.03 | 5224.03 | |
6 | MADD | 01-01-2023 | 01-01-2022 | 2022 | 1 | 4575.03 | 1280.70 |
7 | MADD | 01-02-2022 | 2022 | 2 | 4331.03 | 5718.03 | |
8 | MADD | 01-03-2022 | 2022 | 3 | 4331.03 | 1235.75 | |
9 | MADD | 01-04-2023 | 2023 | 4 | 1224.00 | 1280.70 | |
10 | MADD | 01-05-2023 | 2023 | 5 | 1480.70 | 1330.70 | |
11 | PEYRS | 01-03-2024 | 01-01-2024 | 2024 | 1 | 1280.70 | 1280.70 |
12 | PEYRS | 01-05-2024 | 2024 | 5 | 1200.70 | 1235.75 |
基于date_cod
、
对于每个 projet_id
(DCLT、MADD、PEYRS),我想删除 p50
和 p90
值在 date_cod
.
输出 df 应如下所示。
id | projet_id | date_cod | date | year | month | p50 | p90 |
---|---|---|---|---|---|---|---|
1 | DCLT | 30-03-2022 | 01-01-2022 | 2022 | 1 | ||
2 | DLCT | 01-02-2022 | 2022 | 2 | |||
3 | DLCT | 01-03-2022 | 2022 | 3 | |||
4 | DLCT | 01-01-2023 | 2023 | 1 | 2180.25 | 1280.70 | |
5 | DLCT | 01-02-2023 | 2023 | 2 | 4871.03 | 5224.03 | |
6 | MADD | 01-01-2023 | 01-01-2022 | 2022 | 1 | ||
7 | MADD | 01-02-2022 | 2022 | 2 | |||
8 | MADD | 01-03-2022 | 2022 | 3 | |||
9 | MADD | 01-04-2023 | 2023 | 4 | 1224.00 | 1280.70 | |
10 | MADD | 01-05-2023 | 2023 | 5 | 1480.70 | 1330.70 | |
11 | PEYRS | 01-03-2024 | 01-01-2024 | 2024 | 1 | ||
12 | PEYRS | 01-05-2024 | 2024 | 5 | 1200.70 | 1235.75 |
像这样的东西应该可以工作:
#Make sure dates are in to_datetime
df['date_cod']=pd.to_datetime(df['date_cod'])
df['date']=pd.to_datetime(df['date'])
#Condition:date column is less (in total seconds) than first date for each projet_id's first date_cod value
cond=((df['date']-df.groupby('projet_id')['date_cod'].transform('first')).dt.total_seconds())<0
import numpy as np
#replace by '' where condition defined above is true
df['p50']=np.where(cond,'',df['p50'])
df['p90']=np.where(cond,'',df['p90'])
df