如何找到每月流失的客户？ Python Pandas

Question

我有一个很大的客户数据集，它包含客户 ID、服务 ID、产品等。因此，我们衡量流失的两种方法是在客户 ID 级别，如果整个客户都离开，并且在服务 ID 级别，他们可能会取消五分之二的服务。

数据是这样的，正如我们所见

鳄鱼在 1 月底不再是客户，因为他们在 2 月没有任何行（CustomerChurn）
阿姨在 1 月底不再是客户，因为他们在 2 月没有任何行（CustomerChurn）
Bricks 在 1 月和 2 月继续使用 Apples and Oranges (ServiceContinue)
Bricks 继续成为客户，但在 1 月底取消了两项服务 (ServiceChurn)

我正在尝试编写一些代码来创建 'Churn' 列。我已经尝试过

从 2019 年 10 月开始使用 Set 手动获取 CustomerID 和 ServiceID 列表，然后将其与 2019 年 11 月进行比较，以找到流失的列表。这不是太慢，但看起来不是很 Pythonic。

谢谢！

data = {'CustomerName': ['Alligators','Aunties', 'Bricks', 'Bricks','Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks'], 
        'ServiceID': [1009, 1008, 1001, 1002, 1003, 1004, 1001, 1002, 1001, 1002], 
        'Product': ['Apples', 'Apples', 'Apples', 'Bananas', 'Oranges', 'Watermelon', 'Apples', 'Bananas', 'Apples', 'Bananas'], 
        'Month': ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'], 
        'Year': [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
        'Churn': ['CustomerChurn', 'CustomerChurn', 'ServiceContinue', 'ServiceContinue', 'ServiceChurn', 'ServiceChurn','ServiceContinue', 'ServiceContinue', 'NA', 'NA']}
df = pd.DataFrame(data)
df

Answer 1

我认为这接近您想要的，除了最后两行中的 NA，但如果您确实需要那些 NA，那么您可以按日期过滤并更改值。

因为您实际上是在测试两个不同的分组，所以我通过一个函数发送了第一个 customername 分组，然后根据我看到的情况，我通过第二个函数发送了一个更精确的分组。对于这个数据集，它似乎有效。

我创建了一个实际的日期列，并确保在分组之前对所有内容进行了排序。函数内部的逻辑是测试组的最大日期以查看它是否小于特定日期。看起来你正在测试三月作为当前月份

您应该能够根据自己的需要对其进行调整

df['testdate'] = df.apply(lambda x: datetime.datetime.strptime('-'.join((x['Month'], str(x['Year']))),'%b-%Y'), axis=1)
df = df.sort_values('testdate')
df1 = df.drop('Churn',axis=1)

def get_customerchurn(x, tdate):
    # print(x)
    # print(tdate)
    if x.testdate.max() < tdate:
        x.loc[:, 'Churn'] = 'CustomerChurn'
        return x
    else:
        x = x.groupby(['CustomerName', 'Product']).apply(lambda x: get_servicechurn(x, datetime.datetime(2021,3,1)))
        return x

def get_servicechurn(x, tdate):
    print(x)
    # print(tdate)
    if x.testdate.max() < tdate:
        x.loc[:, 'Churn'] = 'ServiceChurn'
        return x
    else:
        x.loc[:, 'Churn'] = 'ServiceContinue'
        return x

df2 = df1.groupby(['CustomerName']).apply(lambda x: get_customerchurn(x, datetime.datetime(2021,3,1)))
df2

输出：

  CustomerName  ServiceID     Product Month  Year   testdate            Churn
0   Alligators       1009      Apples   Jan  2021 2021-01-01    CustomerChurn
1      Aunties       1008      Apples   Jan  2021 2021-01-01    CustomerChurn
2       Bricks       1001      Apples   Jan  2021 2021-01-01  ServiceContinue
3       Bricks       1002     Bananas   Jan  2021 2021-01-01  ServiceContinue
4       Bricks       1003     Oranges   Jan  2021 2021-01-01     ServiceChurn
5       Bricks       1004  Watermelon   Jan  2021 2021-01-01     ServiceChurn
6       Bricks       1001      Apples   Feb  2021 2021-02-01  ServiceContinue
7       Bricks       1002     Bananas   Feb  2021 2021-02-01  ServiceContinue
8       Bricks       1001      Apples   Mar  2021 2021-03-01  ServiceContinue
9       Bricks       1002     Bananas   Mar  2021 2021-03-01  ServiceContinue

如何找到每月流失的客户？ Python Pandas

How to find churned customers on a monthly basis? Python Pandas

python

pandas

churn