计算与最后日期的差异
Calculate difference to the last date
我有问题。我想得到最后日期的差异。例如 2021-03-22
到下一个日期 (2021-03-18
) 是 4 天。我想计算 customerId
的行日期和最后日期之间的天数差异。所以完整的计算应该针对每个客户。最后一个日期应该是 None
因为我没有更早的日期。问题是如果同一个日期出现不止一次,第二个日期就变成 0
。它应该再次查看上一个日期的时间而不是当前日期。
数据框
customerId fromDate otherInformation
0 1 2021-02-22 Cat
1 1 2021-02-22 Dog
2 1 2021-03-18 Elefant
3 1 2021-03-18 Cat
4 1 2021-03-18 Cat
5 1 2021-03-22 Cat
6 1 2021-02-10 Cat
7 1 2021-09-07 Cat
8 1 None Elefant
9 1 2022-01-18 Fish
10 2 2021-05-17 Fish
代码
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
'fromDate': ['2021-02-22','2021-02-22', '2021-03-18','2021-03-18', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18', '2021-05-17'],
'otherInformation': ['Cat', 'Dog', 'Elefant', 'Cat', 'Cat','Cat', 'Cat', 'Cat', 'Elefant', 'Fish', 'Fish']
}
df = pd.DataFrame(data=d)
print(df)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').groupby('customerId')['fromDate'].shift()
print(df)
我有什么
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 0 days
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 0 days
4 1 2021-03-18 Cat 0 days
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
我想要的
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 12 days # from 0 -> 12
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 24 days # from 0 -> 24
4 1 2021-03-18 Cat 24 days # from 0 -> 24
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
添加DataFrame.drop_duplicates
with GroupBy.ffill
:
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').drop_duplicates(['customerId','fromDate']).groupby('customerId')['fromDate'].shift()
df['lastindays'] = df.groupby(['customerId','fromDate'])['lastindays'].ffill()
print(df)
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 12 days
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 24 days
4 1 2021-03-18 Cat 24 days
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
另一个想法是创建没有重复项的辅助 DataFrame,使用 DataFrameGroupBy.diff
并为新列左连接:
df1 = df.sort_values('fromDate').drop_duplicates(['customerId','fromDate'])
df1['lastindays'] = df1.groupby('customerId')['fromDate'].diff()
df = df.merge(df1[['lastindays','customerId','fromDate']], on=['customerId','fromDate'], how='left')
print(df)
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 12 days
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 24 days
4 1 2021-03-18 Cat 24 days
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
我有问题。我想得到最后日期的差异。例如 2021-03-22
到下一个日期 (2021-03-18
) 是 4 天。我想计算 customerId
的行日期和最后日期之间的天数差异。所以完整的计算应该针对每个客户。最后一个日期应该是 None
因为我没有更早的日期。问题是如果同一个日期出现不止一次,第二个日期就变成 0
。它应该再次查看上一个日期的时间而不是当前日期。
数据框
customerId fromDate otherInformation
0 1 2021-02-22 Cat
1 1 2021-02-22 Dog
2 1 2021-03-18 Elefant
3 1 2021-03-18 Cat
4 1 2021-03-18 Cat
5 1 2021-03-22 Cat
6 1 2021-02-10 Cat
7 1 2021-09-07 Cat
8 1 None Elefant
9 1 2022-01-18 Fish
10 2 2021-05-17 Fish
代码
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
'fromDate': ['2021-02-22','2021-02-22', '2021-03-18','2021-03-18', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18', '2021-05-17'],
'otherInformation': ['Cat', 'Dog', 'Elefant', 'Cat', 'Cat','Cat', 'Cat', 'Cat', 'Elefant', 'Fish', 'Fish']
}
df = pd.DataFrame(data=d)
print(df)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').groupby('customerId')['fromDate'].shift()
print(df)
我有什么
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 0 days
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 0 days
4 1 2021-03-18 Cat 0 days
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
我想要的
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 12 days # from 0 -> 12
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 24 days # from 0 -> 24
4 1 2021-03-18 Cat 24 days # from 0 -> 24
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
添加DataFrame.drop_duplicates
with GroupBy.ffill
:
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['lastindays'] = df['fromDate'] - df.sort_values('fromDate').drop_duplicates(['customerId','fromDate']).groupby('customerId')['fromDate'].shift()
df['lastindays'] = df.groupby(['customerId','fromDate'])['lastindays'].ffill()
print(df)
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 12 days
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 24 days
4 1 2021-03-18 Cat 24 days
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT
另一个想法是创建没有重复项的辅助 DataFrame,使用 DataFrameGroupBy.diff
并为新列左连接:
df1 = df.sort_values('fromDate').drop_duplicates(['customerId','fromDate'])
df1['lastindays'] = df1.groupby('customerId')['fromDate'].diff()
df = df.merge(df1[['lastindays','customerId','fromDate']], on=['customerId','fromDate'], how='left')
print(df)
customerId fromDate otherInformation lastindays
0 1 2021-02-22 Cat 12 days
1 1 2021-02-22 Dog 12 days
2 1 2021-03-18 Elefant 24 days
3 1 2021-03-18 Cat 24 days
4 1 2021-03-18 Cat 24 days
5 1 2021-03-22 Cat 4 days
6 1 2021-02-10 Cat NaT
7 1 2021-09-07 Cat 169 days
8 1 NaT Elefant NaT
9 1 2022-01-18 Fish 133 days
10 2 2021-05-17 Fish NaT