计算 Pandas 中同一客户(同一组)上次访问时间与当前访问时间之间的天数差异
Calculating the days difference between previous visit time to current visit time for same customer (same group) in Pandas
我正在尝试计算客户上次访问时间与客户最近一次访问之间的时间差(以天为单位)。
time difference = latest in time - previous out time
这是输入数据的示例
样本输出table
到目前为止我尝试过的基于客户ID和排名的groupby方法
temp['RANK'] = temp.groupby('customer ID')['in time'].rank(ascending=True)
但我不确定如何计算差异。
您可以尝试以下方法:
temp.groupby('customer ID').apply(lambda x: (x['in time'].max() - x['out time'].min()).days )
可以用GroupBy.shift()
to get the previous out time
within the group. Substracted by current in time
. Then, use dt.days
获取组内in time
和out time
之间的timedelta的天数,如下:
# convert date strings to datetime format
df['out time'] = pd.to_datetime(df['out time'], dayfirst=True)
df['in time'] = pd.to_datetime(df['in time'], dayfirst=True)
df['Visit diff (in days)'] = (df['in time'] - df['out time'].groupby(df['customer ID']).shift()).dt.days
数据输入:
print(df)
customer ID out time in time
0 1 05-12-1999 15:20:07 05-12-1999 14:23:31
1 1 21-12-1999 09:59:34 21-12-1999 09:41:09
2 2 05-12-1999 11:53:34 05-12-1999 11:05:37
3 2 08-12-1999 19:55:00 08-12-1999 19:40:10
4 3 01-12-1999 15:15:26 01-12-1999 13:08:11
5 3 16-12-1999 17:10:09 16-12-1999 16:34:10
结果:
print(df)
customer ID out time in time Visit diff (in days)
0 1 1999-12-05 15:20:07 1999-12-05 14:23:31 NaN
1 1 1999-12-21 09:59:34 1999-12-21 09:41:09 15.0
2 2 1999-12-05 11:53:34 1999-12-05 11:05:37 NaN
3 2 1999-12-08 19:55:00 1999-12-08 19:40:10 3.0
4 3 1999-12-01 15:15:26 1999-12-01 13:08:11 NaN
5 3 1999-12-16 17:10:09 1999-12-16 16:34:10 15.0
我正在尝试计算客户上次访问时间与客户最近一次访问之间的时间差(以天为单位)。
time difference = latest in time - previous out time
这是输入数据的示例
样本输出table
到目前为止我尝试过的基于客户ID和排名的groupby方法
temp['RANK'] = temp.groupby('customer ID')['in time'].rank(ascending=True)
但我不确定如何计算差异。
您可以尝试以下方法:
temp.groupby('customer ID').apply(lambda x: (x['in time'].max() - x['out time'].min()).days )
可以用GroupBy.shift()
to get the previous out time
within the group. Substracted by current in time
. Then, use dt.days
获取组内in time
和out time
之间的timedelta的天数,如下:
# convert date strings to datetime format
df['out time'] = pd.to_datetime(df['out time'], dayfirst=True)
df['in time'] = pd.to_datetime(df['in time'], dayfirst=True)
df['Visit diff (in days)'] = (df['in time'] - df['out time'].groupby(df['customer ID']).shift()).dt.days
数据输入:
print(df)
customer ID out time in time
0 1 05-12-1999 15:20:07 05-12-1999 14:23:31
1 1 21-12-1999 09:59:34 21-12-1999 09:41:09
2 2 05-12-1999 11:53:34 05-12-1999 11:05:37
3 2 08-12-1999 19:55:00 08-12-1999 19:40:10
4 3 01-12-1999 15:15:26 01-12-1999 13:08:11
5 3 16-12-1999 17:10:09 16-12-1999 16:34:10
结果:
print(df)
customer ID out time in time Visit diff (in days)
0 1 1999-12-05 15:20:07 1999-12-05 14:23:31 NaN
1 1 1999-12-21 09:59:34 1999-12-21 09:41:09 15.0
2 2 1999-12-05 11:53:34 1999-12-05 11:05:37 NaN
3 2 1999-12-08 19:55:00 1999-12-08 19:40:10 3.0
4 3 1999-12-01 15:15:26 1999-12-01 13:08:11 NaN
5 3 1999-12-16 17:10:09 1999-12-16 16:34:10 15.0