根据客户在特定时期内使用 Python 进行的交易来识别客户群

Question

出于客户细分的目的，我想根据给定的 table 带日期的交易记录，分析客户在前 10 天和 20 天内进行了多少次交易。在此 table 中，最后 2 列使用以下代码连接。

But I'm not satisfied with this code, please suggest me improvement.

import pandas as pd

df4 = pd.read_excel(path)

# Since A and B two customers are there, two separate dataframe created

df4A = df4[df4['Customer_ID'] == 'A']
df4B = df4[df4['Customer_ID'] == 'B']

from datetime import date
from dateutil.relativedelta import relativedelta

txn_prior_10days = []

for i in range(len(df4)):
    
    current_date = df4.iloc[i,2]
    prior_10days_date = current_date - relativedelta(days=10)
    
    if df4.iloc[i,1] == 'A':
        No_of_txn = ((df4A['Transaction_Date'] >= prior_10days_date) & (df4A['Transaction_Date'] < current_date)).sum()
        txn_prior_10days.append(No_of_txn)
    
    elif df4.iloc[i,1] == 'B':
        No_of_txn = ((df4B['Transaction_Date'] >= prior_10days_date) & (df4B['Transaction_Date'] < current_date)).sum()
        txn_prior_10days.append(No_of_txn)

txn_prior_20days = []

for i in range(len(df4)):
    
    current_date = df4.iloc[i,2]
    prior_20days_date = current_date - relativedelta(days=20)
    
    if df4.iloc[i,1] == 'A':
        no_of_txn = ((df4A['Transaction_Date'] >= prior_20days_date) & (df4A['Transaction_Date'] < current_date)).sum()
        txn_prior_20days.append(no_of_txn)
    
    elif df4.iloc[i,1] == 'B':
        no_of_txn = ((df4B['Transaction_Date'] >= prior_20days_date) & (df4B['Transaction_Date'] < current_date)).sum()
        txn_prior_20days.append(no_of_txn) 

df4['txn_prior_10days'] = txn_prior_10days
df4['txn_prior_20days'] = txn_prior_20days
df4

Answer 1

如果你有，你的代码将很难写例如10 个不同的 Customer_ID。幸运的是，有更短的解决方案：

读取文件时，将Transaction_Date转换为datetime，例如传递 parse_dates=['Transaction_Date'] 到 read_excel.
定义一个函数，计算组中有多少日期 (gr) 在 tDlt (Timedelta) 和 1 天 之间的范围内当前日期 (dd):
```
def cntPrevTr(dd, gr, tDtl):
    return gr.between(dd - tDtl, dd - pd.Timedelta(1, 'D')).sum()
```
对当前组的每个成员应用两次按 Customer_ID （实际上只到 Transaction_Date 列），一次 tDtl == 10 天 第二次 tDlt == 20 天.

定义一个函数，计算两列的数量交易，对于当前交易日期组：

def priorTx(td):
    return pd.DataFrame({
        'tx10' : td.apply(cntPrevTr, args=(td, pd.Timedelta(10, 'D'))),
        'tx20' : td.apply(cntPrevTr, args=(td, pd.Timedelta(20, 'D')))})

生成结果：
```
df[['txn_prior_10days', 'txn_prior_20days']] = df.groupby('Customer_ID')\
    .Transaction_Date.apply(priorTx)
```
上面的代码：
- 组 df by Customer_ID,
- 仅取自当前组Transaction_Date列，
- 对其应用priorTx函数，
- 将结果保存在 2 个目标列中。

结果，稍微缩短 Transaction_ID，是：

   Transaction_ID Customer_ID Transaction_Date  txn_prior_10days  txn_prior_20days
0          912410           A       2019-01-01                 0                 0   
1          912341           A       2019-01-03                 1                 1   
2          312415           A       2019-01-09                 2                 2   
3          432513           A       2019-01-12                 2                 3   
4          357912           A       2019-01-19                 2                 4   
5          912411           B       2019-01-06                 0                 0   
6          912342           B       2019-01-11                 1                 1   
7          312416           B       2019-01-13                 2                 2   
8          432514           B       2019-01-20                 2                 3   
9          357913           B       2019-01-21                 3                 4

您不能使用滚动计算，因为：

滚动window从当前行开始向前延伸，但是你想要统计 之前的 笔交易，
滚动计算包括当前行，而你想排除它。

这就是我想出上述解决方案的原因（仅 8 行代码）。

详细说明我的解决方案如何运作

要查看所有详细信息，请按以下方式创建测试 DataFrame：

import io

txt = '''
Transaction_ID Customer_ID Transaction_Date
912410         A           2019-01-01
912341         A           2019-01-03
312415         A           2019-01-09
432513         A           2019-01-12
357912         A           2019-01-19
912411         B           2019-01-06
912342         B           2019-01-11
312416         B           2019-01-13
432514         B           2019-01-20
357913         B           2019-01-21'''

df = pd.read_fwf(io.StringIO(txt), skiprows=1,
    widths=[15, 12, 16], parse_dates=[2])

执行 groupby，但现在只检索具有键 'A':

的组

gr = df.groupby('Customer_ID')
grp = gr.get_group('A')

它包含：

   Transaction_ID Customer_ID Transaction_Date
0          912410           A       2019-01-01
1          912341           A       2019-01-03
2          312415           A       2019-01-09
3          432513           A       2019-01-12
4          357912           A       2019-01-19

让我们从最详细的问题开始，如何工作cntPrevTr。从 grp:

中检索其中一个日期

dd = grp.iloc[2,2]

它包含时间戳('2019-01-09 00:00:00')。要测试此日期 cntPrevTr 的示例调用，运行:

cntPrevTr(dd, grp.Transaction_Date, pd.Timedelta(10, 'D'))

即您要检查此客户之前执行了多少笔交易在此日期之前，但不早于 10 天前。结果是 2.

要查看整个第一列的计算方式，运行:

td = grp.Transaction_Date
td.apply(cntPrevTr, args=(td, pd.Timedelta(10, 'D')))

结果是：

0    0
1    1
2    2
3    2
4    2
Name: Transaction_Date, dtype: int64

左列是索引，右列是返回值从 cntPrevTr 调用每个日期。

最后一件事是展示整个小组的结果如何生成。运行:

priorTx(grp.Transaction_Date)

结果（一个 DataFrame）是：

   tx10  tx20
0     0     0
1     1     1
2     2     2
3     2     3
4     2     4

对所有其他组进行相同的过程，然后所有部分结果都被（垂直）连接起来，最后一个步骤是将整个 DataFrame 的两列保存在 df.

的各列

根据客户在特定时期内使用 Python 进行的交易来识别客户群

Identify customer segments based on transactions that they have made in specific period using Python

python

python-3.x

python-datetime

pandas

详细说明我的解决方案如何运作