如何计算每个用户的最大和最小日期之间的差异

How to calculate difference between max and min date for each user

如何计算每个用户的最大和最小日期之间的差异

我试过 -

df['diff'] = (df['purschase_date'].max() - df['purschase_date'].min()).dt.days

但是它计算了所有行,而不是这个特定的用户,我如何通过 user_id

来计算它

这是数据框的样子

试试这个:

import pandas as pd
d = {'user_id': [2432730,2432730, 2432731,2432731],
     'purchase_date': ["2020-09-09", "2020-08-09","2020-09-09","2020-09-19"]}

df = pd.DataFrame(data=d)
df['purchase_date']=pd.to_datetime(df['purchase_date'])

输入:

max_date=df[['user_id','purchase_date']].groupby(by='user_id').max().reset_index().rename(columns={'purchase_date':'max_date'})
min_date=df[['user_id','purchase_date']].groupby(by='user_id').min().reset_index().rename(columns={'purchase_date':'min_date'})
min_date=min_date.join(max_date['max_date'])
min_date['diff']=(min_date['max_date']-min_date['min_date']).dt.days
min_date

输出:

你只需要一个groupbyuser_id来计算差异

df.groupby('user_id')['purchase_date'].max() - df.groupby('user_id')['purchase_date'].min()

这将创建一个系列,它不能直接分配给 df,因为只有两个用户,所以您将有两行。所以你需要将结果分配回数据框。

为日期的最小值和最大值创建新的数据框,按 ID 分组,命名列, 稍后与原始合并。

数据输入:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({
        "user_id": (np.random.randint(10000,10004,15, dtype="int32")),
        "purchase_date": (pd.date_range(start='2022-01-01', periods=15, freq='8H')),
        "C": pd.Series(1, index=list(range(15)), dtype="float32"),
        "D": np.array([5] * 15, dtype="int32"),
        "E": "foo",
    })
    df['purchase_date'] = pd.to_datetime(df['purchase_date']).dt.normalize()

    

# Solution


df_grouped = df.groupby(['user_id']).agg(
    date_min=('purchase_date', 'min'),
    date_max=('purchase_date', 'max'))\
    .reset_index()
df_grouped['diff']=(df_grouped['date_max']-df_grouped['date_min']).dt.days
df1 = pd.merge(df, df_grouped)
df1

输出:

   user_id purchase_date    C  D    E   date_min   date_max  diff
0     10001    2022-01-01  1.0  5  foo 2022-01-01 2022-01-04     3
1     10001    2022-01-02  1.0  5  foo 2022-01-01 2022-01-04     3
2     10001    2022-01-03  1.0  5  foo 2022-01-01 2022-01-04     3
3     10001    2022-01-04  1.0  5  foo 2022-01-01 2022-01-04     3
4     10000    2022-01-01  1.0  5  foo 2022-01-01 2022-01-04     3
5     10000    2022-01-02  1.0  5  foo 2022-01-01 2022-01-04     3
6     10000    2022-01-03  1.0  5  foo 2022-01-01 2022-01-04     3
7     10000    2022-01-04  1.0  5  foo 2022-01-01 2022-01-04     3
8     10002    2022-01-01  1.0  5  foo 2022-01-01 2022-01-05     4
9     10002    2022-01-02  1.0  5  foo 2022-01-01 2022-01-05     4
10    10002    2022-01-03  1.0  5  foo 2022-01-01 2022-01-05     4
11    10002    2022-01-05  1.0  5  foo 2022-01-01 2022-01-05     4
12    10002    2022-01-05  1.0  5  foo 2022-01-01 2022-01-05     4
13    10003    2022-01-04  1.0  5  foo 2022-01-04 2022-01-05     1
14    10003    2022-01-05  1.0  5  foo 2022-01-04 2022-01-05     1