迭代一个数据框以计算新功能 - Python

Iterate over one dataframe to calculate new features - Python

我正在使用包含以下列的信用卡交易数据框:

timestamp, transaction_id, buyer_id, status

我不想生成一个新列 q_app_1d,它根据条件(相同 buyer_id,为每个 transaction_id 计算先前 transaction_id 的数量status = 1timestamp 之间的差异 <= 1 天)。

我曾尝试使用自连接(也就是将数据框与自身连接)来执行此操作,但未能成功。 我知道如何在 SQL 中轻松地做到这一点,但我无法在 Pandas 中使用它。

非常感谢任何帮助或提示!

编辑:

示例输入:

timestamp, transaction_id, buyer_id, status
01/01/2020 00:00:00, 1, abc123, 1
01/01/2020 00:25:00, 2, abc123, 1
01/01/2020 00:30:00, 3, abc123, 1
01/01/2020 00:45:00, 4, def456, 1
02/01/2020 08:55:00, 5, abc123, 1
02/01/2020 10:55:00, 6, def456, 1
03/01/2020 12:55:00, 7, def456, 1

示例输出:

timestamp, transaction_id, buyer_id, status, q_app_1d
01/01/2020 00:00:00, 1, abc123, 1, 0
01/01/2020 00:25:00, 2, abc123, 1, 1 #(considers transaction_id 1)
01/01/2020 00:30:00, 3, abc123, 1, 2 #(considers transaction_id 1,2)
01/01/2020 00:45:00, 4, def456, 1, 0
02/01/2020 08:55:00, 5, abc123, 1, 0 #(more than one day since transaction_id 3)
02/01/2020 10:55:00, 6, def456, 1, 0 #(more than one day since transaction_id 4)
03/01/2020 08:55:00, 7, def456, 1, 1 #(considers transaction_id 6)

这应该有效:

df['timestamp'] = pd.to_datetime(df['timestamp'],dayfirst=True)
df = df.set_index('timestamp')

_df = (df.groupby('buyer_id')['status'].rolling('24H').count()-1).reset_index()
_df.columns = ['buyer_id','timestamp','q_app_1d']
df = df.reset_index()
df = df.merge(_df)
df.head(7)