Pandas: 基于时间间隔的新值

Pandas: New value based on time interval

想象这样一种情况,客户每次访问我们时,他们都应该支付 100 美元,除非自上次付款以来已过去 < 30 天。

有了每个客户预约的日期,是否可以预测应该付款的预约?

以下面的dataframe为例:

pd.DataFrame({
  'dd_mm_aa': {
    0: '01/12/21',
    1: '01/12/21',
    2: '10/12/21',
    3: '10/12/21',
    4: '03/01/22',
    5: '03/01/22',
    6: '03/01/22',
    7: '15/01/22',
    8: '15/01/22',
    9: '06/02/22'},
  'name': {0: 'John',
    1: 'Mary',
    2: 'John',
    3: 'Peter',
    4: 'John',
    5: 'Mary',
    6: 'Peter',
    7: 'Mary',
    8: 'John',
    9: 'John'}
    })

我能够使用以下代码添加第一次预约客户时要支付的价值。

# Adding 150 at firts appearance of a patient
df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df.loc[df.groupby('name')["dd_mm_aa"].rank() == 1, 'value'] = 100

之后的dataframe如下:

dd_mm_aa     name    value
01/12/21    John    100
01/12/21    Mary    100
10/12/21    John    
10/12/21    Peter   100
03/01/22    John    
03/01/22    Mary    
03/01/22    Peter   
15/01/22    Mary    
15/01/22    John    
06/02/22    John    

但是考虑到>30天的时间间隔,最终的输出应该是:

dd_mm_aa    name    value
01/12/21    John    100
01/12/21    Mary    100
10/12/21    John    
10/12/21    Peter   100
03/01/22    John    100
03/01/22    Mary    100
03/01/22    Peter   
15/01/22    Mary    
15/01/22    John    
06/02/22    John    100

使用迭代方法:

from datetime import timedelta

last_paid = {}
def check_paid(r):
    if r['name'] not in last_paid:
        last_paid[r['name']] = r['dd_mm_aa']
        r['value'] = 100
    elif last_paid[r['name']] + timedelta(days=30) < r['dd_mm_aa']:
        last_paid[r['name']] = r['dd_mm_aa']
        r['value'] = 100
    return r

df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df = df.apply(check_paid, axis=1)

输出:

    dd_mm_aa    name    value
0   2021-12-01  John    100.0
1   2021-12-01  Mary    100.0
2   2021-12-10  John    NaN
3   2021-12-10  Peter   100.0
4   2022-01-03  John    100.0
5   2022-01-03  Mary    100.0
6   2022-01-03  Peter   NaN
7   2022-01-15  Mary    NaN
8   2022-01-15  John    NaN
9   2022-02-06  John    100.0