Pandas: 基于时间间隔的新值
Pandas: New value based on time interval
想象这样一种情况,客户每次访问我们时,他们都应该支付 100 美元,除非自上次付款以来已过去 < 30 天。
有了每个客户预约的日期,是否可以预测应该付款的预约?
以下面的dataframe为例:
pd.DataFrame({
'dd_mm_aa': {
0: '01/12/21',
1: '01/12/21',
2: '10/12/21',
3: '10/12/21',
4: '03/01/22',
5: '03/01/22',
6: '03/01/22',
7: '15/01/22',
8: '15/01/22',
9: '06/02/22'},
'name': {0: 'John',
1: 'Mary',
2: 'John',
3: 'Peter',
4: 'John',
5: 'Mary',
6: 'Peter',
7: 'Mary',
8: 'John',
9: 'John'}
})
我能够使用以下代码添加第一次预约客户时要支付的价值。
# Adding 150 at firts appearance of a patient
df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df.loc[df.groupby('name')["dd_mm_aa"].rank() == 1, 'value'] = 100
之后的dataframe如下:
dd_mm_aa name value
01/12/21 John 100
01/12/21 Mary 100
10/12/21 John
10/12/21 Peter 100
03/01/22 John
03/01/22 Mary
03/01/22 Peter
15/01/22 Mary
15/01/22 John
06/02/22 John
但是考虑到>30天的时间间隔,最终的输出应该是:
dd_mm_aa name value
01/12/21 John 100
01/12/21 Mary 100
10/12/21 John
10/12/21 Peter 100
03/01/22 John 100
03/01/22 Mary 100
03/01/22 Peter
15/01/22 Mary
15/01/22 John
06/02/22 John 100
使用迭代方法:
from datetime import timedelta
last_paid = {}
def check_paid(r):
if r['name'] not in last_paid:
last_paid[r['name']] = r['dd_mm_aa']
r['value'] = 100
elif last_paid[r['name']] + timedelta(days=30) < r['dd_mm_aa']:
last_paid[r['name']] = r['dd_mm_aa']
r['value'] = 100
return r
df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df = df.apply(check_paid, axis=1)
输出:
dd_mm_aa name value
0 2021-12-01 John 100.0
1 2021-12-01 Mary 100.0
2 2021-12-10 John NaN
3 2021-12-10 Peter 100.0
4 2022-01-03 John 100.0
5 2022-01-03 Mary 100.0
6 2022-01-03 Peter NaN
7 2022-01-15 Mary NaN
8 2022-01-15 John NaN
9 2022-02-06 John 100.0
想象这样一种情况,客户每次访问我们时,他们都应该支付 100 美元,除非自上次付款以来已过去 < 30 天。
有了每个客户预约的日期,是否可以预测应该付款的预约?
以下面的dataframe为例:
pd.DataFrame({
'dd_mm_aa': {
0: '01/12/21',
1: '01/12/21',
2: '10/12/21',
3: '10/12/21',
4: '03/01/22',
5: '03/01/22',
6: '03/01/22',
7: '15/01/22',
8: '15/01/22',
9: '06/02/22'},
'name': {0: 'John',
1: 'Mary',
2: 'John',
3: 'Peter',
4: 'John',
5: 'Mary',
6: 'Peter',
7: 'Mary',
8: 'John',
9: 'John'}
})
我能够使用以下代码添加第一次预约客户时要支付的价值。
# Adding 150 at firts appearance of a patient
df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df.loc[df.groupby('name')["dd_mm_aa"].rank() == 1, 'value'] = 100
之后的dataframe如下:
dd_mm_aa name value
01/12/21 John 100
01/12/21 Mary 100
10/12/21 John
10/12/21 Peter 100
03/01/22 John
03/01/22 Mary
03/01/22 Peter
15/01/22 Mary
15/01/22 John
06/02/22 John
但是考虑到>30天的时间间隔,最终的输出应该是:
dd_mm_aa name value
01/12/21 John 100
01/12/21 Mary 100
10/12/21 John
10/12/21 Peter 100
03/01/22 John 100
03/01/22 Mary 100
03/01/22 Peter
15/01/22 Mary
15/01/22 John
06/02/22 John 100
使用迭代方法:
from datetime import timedelta
last_paid = {}
def check_paid(r):
if r['name'] not in last_paid:
last_paid[r['name']] = r['dd_mm_aa']
r['value'] = 100
elif last_paid[r['name']] + timedelta(days=30) < r['dd_mm_aa']:
last_paid[r['name']] = r['dd_mm_aa']
r['value'] = 100
return r
df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df = df.apply(check_paid, axis=1)
输出:
dd_mm_aa name value
0 2021-12-01 John 100.0
1 2021-12-01 Mary 100.0
2 2021-12-10 John NaN
3 2021-12-10 Peter 100.0
4 2022-01-03 John 100.0
5 2022-01-03 Mary 100.0
6 2022-01-03 Peter NaN
7 2022-01-15 Mary NaN
8 2022-01-15 John NaN
9 2022-02-06 John 100.0