迭代一个数据框以计算新功能 - Python

Question

我正在使用包含以下列的信用卡交易数据框：

timestamp, transaction_id, buyer_id, status

我不想生成一个新列 q_app_1d，它根据条件（相同 buyer_id，为每个 transaction_id 计算先前 transaction_id 的数量status = 1，timestamp 之间的差异 <= 1 天）。

我曾尝试使用自连接（也就是将数据框与自身连接）来执行此操作，但未能成功。我知道如何在 SQL 中轻松地做到这一点，但我无法在 Pandas 中使用它。

非常感谢任何帮助或提示！

编辑：

示例输入：

timestamp, transaction_id, buyer_id, status
01/01/2020 00:00:00, 1, abc123, 1
01/01/2020 00:25:00, 2, abc123, 1
01/01/2020 00:30:00, 3, abc123, 1
01/01/2020 00:45:00, 4, def456, 1
02/01/2020 08:55:00, 5, abc123, 1
02/01/2020 10:55:00, 6, def456, 1
03/01/2020 12:55:00, 7, def456, 1

示例输出：

timestamp, transaction_id, buyer_id, status, q_app_1d
01/01/2020 00:00:00, 1, abc123, 1, 0
01/01/2020 00:25:00, 2, abc123, 1, 1 #(considers transaction_id 1)
01/01/2020 00:30:00, 3, abc123, 1, 2 #(considers transaction_id 1,2)
01/01/2020 00:45:00, 4, def456, 1, 0
02/01/2020 08:55:00, 5, abc123, 1, 0 #(more than one day since transaction_id 3)
02/01/2020 10:55:00, 6, def456, 1, 0 #(more than one day since transaction_id 4)
03/01/2020 08:55:00, 7, def456, 1, 1 #(considers transaction_id 6)

Answer 1

这应该有效：

df['timestamp'] = pd.to_datetime(df['timestamp'],dayfirst=True)
df = df.set_index('timestamp')

_df = (df.groupby('buyer_id')['status'].rolling('24H').count()-1).reset_index()
_df.columns = ['buyer_id','timestamp','q_app_1d']
df = df.reset_index()
df = df.merge(_df)
df.head(7)

迭代一个数据框以计算新功能 - Python

Iterate over one dataframe to calculate new features - Python

python

variables

analytics

pandas

feature-engineering