计算与最后日期的差异

Calculate difference to the last date

我有问题。我想得到最后日期的差异。例如 2021-03-22 到下一个日期 (2021-03-18) 是 4 天。我想计算 customerId 的行日期和最后日期之间的天数差异。所以完整的计算应该针对每个客户。最后一个日期应该是 None 因为我没有更早的日期。问题是如果同一个日期出现不止一次,第二个日期就变成 0。它应该再次查看上一个日期的时间而不是当前日期。

数据框

    customerId    fromDate otherInformation
0            1  2021-02-22              Cat
1            1  2021-02-22              Dog
2            1  2021-03-18          Elefant
3            1  2021-03-18              Cat
4            1  2021-03-18              Cat
5            1  2021-03-22              Cat
6            1  2021-02-10              Cat
7            1  2021-09-07              Cat
8            1        None          Elefant
9            1  2022-01-18             Fish
10           2  2021-05-17             Fish

代码

import pandas as pd


d = {'customerId': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
     'fromDate': ['2021-02-22','2021-02-22', '2021-03-18','2021-03-18', '2021-03-18', '2021-03-22', 
'2021-02-10', '2021-09-07', None, '2022-01-18', '2021-05-17'],
     'otherInformation': ['Cat', 'Dog', 'Elefant', 'Cat', 'Cat','Cat', 'Cat', 'Cat', 'Elefant', 'Fish', 'Fish']
    }
df = pd.DataFrame(data=d)
print(df)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').groupby('customerId')['fromDate'].shift()
print(df)

我有什么

    customerId   fromDate otherInformation lastindays
0            1 2021-02-22              Cat    12 days
1            1 2021-02-22              Dog     0 days
2            1 2021-03-18          Elefant    24 days
3            1 2021-03-18              Cat     0 days
4            1 2021-03-18              Cat     0 days
5            1 2021-03-22              Cat     4 days
6            1 2021-02-10              Cat        NaT
7            1 2021-09-07              Cat   169 days
8            1        NaT          Elefant        NaT
9            1 2022-01-18             Fish   133 days
10           2 2021-05-17             Fish        NaT

我想要的

    customerId   fromDate otherInformation lastindays
0            1 2021-02-22              Cat    12 days
1            1 2021-02-22              Dog    12 days # from 0 -> 12
2            1 2021-03-18          Elefant    24 days
3            1 2021-03-18              Cat    24 days # from 0 -> 24
4            1 2021-03-18              Cat    24 days # from 0 -> 24
5            1 2021-03-22              Cat     4 days
6            1 2021-02-10              Cat        NaT
7            1 2021-09-07              Cat   169 days
8            1        NaT          Elefant        NaT
9            1 2022-01-18             Fish   133 days
10           2 2021-05-17             Fish        NaT

添加DataFrame.drop_duplicates with GroupBy.ffill:

df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').drop_duplicates(['customerId','fromDate']).groupby('customerId')['fromDate'].shift()
df['lastindays'] = df.groupby(['customerId','fromDate'])['lastindays'].ffill()
print(df)
    customerId   fromDate otherInformation lastindays
0            1 2021-02-22              Cat    12 days
1            1 2021-02-22              Dog    12 days
2            1 2021-03-18          Elefant    24 days
3            1 2021-03-18              Cat    24 days
4            1 2021-03-18              Cat    24 days
5            1 2021-03-22              Cat     4 days
6            1 2021-02-10              Cat        NaT
7            1 2021-09-07              Cat   169 days
8            1        NaT          Elefant        NaT
9            1 2022-01-18             Fish   133 days
10           2 2021-05-17             Fish        NaT

另一个想法是创建没有重复项的辅助 DataFrame,使用 DataFrameGroupBy.diff 并为新列左连接:

df1 = df.sort_values('fromDate').drop_duplicates(['customerId','fromDate'])
df1['lastindays'] = df1.groupby('customerId')['fromDate'].diff()
df = df.merge(df1[['lastindays','customerId','fromDate']], on=['customerId','fromDate'], how='left')
print(df)
    customerId   fromDate otherInformation lastindays
0            1 2021-02-22              Cat    12 days
1            1 2021-02-22              Dog    12 days
2            1 2021-03-18          Elefant    24 days
3            1 2021-03-18              Cat    24 days
4            1 2021-03-18              Cat    24 days
5            1 2021-03-22              Cat     4 days
6            1 2021-02-10              Cat        NaT
7            1 2021-09-07              Cat   169 days
8            1        NaT          Elefant        NaT
9            1 2022-01-18             Fish   133 days
10           2 2021-05-17             Fish        NaT