每月显示的唯一 ID 滚动 12 个月计数
Rolling 12 months count of unique id displayed monthly
我正在处理交易数据,希望过去 12 个月滚动活跃客户群,但在每月水平上。
以下是我拥有的交易数据示例 - Transaction Data
Cust-ID Date-of-Purchase Quantity Customer-Tag
N01847 01-04-18 10 B2C
N01847 15-04-18 20 B2C
N01847 25-06-19 20 B2C
N01847 12-02-20 100 B2C
N01847 25-03-20 150 B2C
N02341 01-12-19 20 B2B
N02341 02-02-20 150 B2B
N02011 25-01-19 10 B2B
N01033 02-06-18 30 B2C
N01033 02-04-19 40 B2C
N01033 15-04-19 50 B2C
N01033 15-02-20 20 B2C
N01010 16-01-18 100 B2C
N01010 25-02-18 250 B2C
N01010 03-09-18 10 B2C
N01010 04-04-19 250 B2C
N01010 15-06-19 100 B2C
N01010 26-07-19 40 B2C
N01010 17-09-19 10 B2C
N01010 17-09-19 10 B2C
N01010 20-03-20 20 B2C
N09100 20-01-18 20 B2B
N09100 20-01-18 20 B2B
N09100 25-01-20 20 B2B
N09100 25-01-20 20 B2B
N09100 25-01-20 20 B2B ֿ
N09100 25-01-20 20 B2B
下面是我希望通过 python - desired output
得到的结果
Month-Year B2C-Rolling-past-12months-Active-Customers Monthly-Active-Customers Monthly-Active-Rate
201801 100230 25058 25.0%
201802 100524 25634 25.5%
201803 100810 25213 25.0%
201804 101253 25495 25.2%
201805 101351 25525 25.2%
201806 103210 25998 25.2%
201807 103678 26122 25.2%
201808 103977 26202 25.2%
201809 104512 26342 25.2%
201810 104624 26376 25.2%
201811 105479 26597 25.2%
201812 111256 28059 26.2%
201901 112247 28314 25.2%
201902 112947 28497 25.2%
201903 113508 28644 25.2%
201904 113857 28737 25.2%
201905 114572 28924 25.2%
201906 115443 29149 25.3%
201907 116056 29310 25.3%
201908 116528 29435 25.3%
201909 116971 29553 25.3%
201910 117647 29729 25.3%
201911 118492 29949 25.3%
201912 124095 31371 26.3%
202001 124895 31580 25.3%
202002 125653 31778 25.3%
202003 126320 31953 25.3%
我非常感谢任何有关 python (spyder) 代码的帮助,这将有助于获得所需的结果。
您可能想使用 pandas,然后执行类似的操作:
df["Date-of-Purchase"] = pd.to_datetime(df["Date-of-Purchase"], dayfirst=True)
df["Month"] = df["Date-of-Purchase"].dt.month
df["Year"] = df["Date-of-Purchase"].dt.year
res = df.groupby(["Year", "Month"])["Cust-ID"].nunique()
这将为您提供每月唯一身份客户的数量。假设你有所有月份的数据,你现在可以使用 rolling
来获得 12 个月的滚动总和(我在代码中使用了 3 个月以便于调试):
monthly_customers = df.groupby(["Year", "Month"])["Cust-ID"].apply(lambda x: frozenset(x.values))
monthly_customers = monthly_customers.reset_index()
monthly_customers = pd.concat([monthly_customers] + [monthly_customers["Cust-ID"].shift(i) for i in range(1, 3)], axis ="columns")
monthly_customers.columns = ["Year", "Month"] + [ f"shift_{i}" for i in range(3) ]
def count_unique(row):
total_set = frozenset()
columns = [ f"shift_{i}" for i in range(3) ]
for col in columns:
if row.get(col) and type(row.get(col)) == frozenset:
total_set = total_set | row.get(col)
return len(total_set)
monthly_customers["N_month_count"] = monthly_customers.apply(count_unique, axis=1)
monthly_customers
如果您没有所有月份的数据,则需要填写缺失的月份。
我正在处理交易数据,希望过去 12 个月滚动活跃客户群,但在每月水平上。
以下是我拥有的交易数据示例 - Transaction Data
Cust-ID Date-of-Purchase Quantity Customer-Tag
N01847 01-04-18 10 B2C
N01847 15-04-18 20 B2C
N01847 25-06-19 20 B2C
N01847 12-02-20 100 B2C
N01847 25-03-20 150 B2C
N02341 01-12-19 20 B2B
N02341 02-02-20 150 B2B
N02011 25-01-19 10 B2B
N01033 02-06-18 30 B2C
N01033 02-04-19 40 B2C
N01033 15-04-19 50 B2C
N01033 15-02-20 20 B2C
N01010 16-01-18 100 B2C
N01010 25-02-18 250 B2C
N01010 03-09-18 10 B2C
N01010 04-04-19 250 B2C
N01010 15-06-19 100 B2C
N01010 26-07-19 40 B2C
N01010 17-09-19 10 B2C
N01010 17-09-19 10 B2C
N01010 20-03-20 20 B2C
N09100 20-01-18 20 B2B
N09100 20-01-18 20 B2B
N09100 25-01-20 20 B2B
N09100 25-01-20 20 B2B
N09100 25-01-20 20 B2B ֿ
N09100 25-01-20 20 B2B
下面是我希望通过 python - desired output
得到的结果Month-Year B2C-Rolling-past-12months-Active-Customers Monthly-Active-Customers Monthly-Active-Rate
201801 100230 25058 25.0%
201802 100524 25634 25.5%
201803 100810 25213 25.0%
201804 101253 25495 25.2%
201805 101351 25525 25.2%
201806 103210 25998 25.2%
201807 103678 26122 25.2%
201808 103977 26202 25.2%
201809 104512 26342 25.2%
201810 104624 26376 25.2%
201811 105479 26597 25.2%
201812 111256 28059 26.2%
201901 112247 28314 25.2%
201902 112947 28497 25.2%
201903 113508 28644 25.2%
201904 113857 28737 25.2%
201905 114572 28924 25.2%
201906 115443 29149 25.3%
201907 116056 29310 25.3%
201908 116528 29435 25.3%
201909 116971 29553 25.3%
201910 117647 29729 25.3%
201911 118492 29949 25.3%
201912 124095 31371 26.3%
202001 124895 31580 25.3%
202002 125653 31778 25.3%
202003 126320 31953 25.3%
我非常感谢任何有关 python (spyder) 代码的帮助,这将有助于获得所需的结果。
您可能想使用 pandas,然后执行类似的操作:
df["Date-of-Purchase"] = pd.to_datetime(df["Date-of-Purchase"], dayfirst=True)
df["Month"] = df["Date-of-Purchase"].dt.month
df["Year"] = df["Date-of-Purchase"].dt.year
res = df.groupby(["Year", "Month"])["Cust-ID"].nunique()
这将为您提供每月唯一身份客户的数量。假设你有所有月份的数据,你现在可以使用 rolling
来获得 12 个月的滚动总和(我在代码中使用了 3 个月以便于调试):
monthly_customers = df.groupby(["Year", "Month"])["Cust-ID"].apply(lambda x: frozenset(x.values))
monthly_customers = monthly_customers.reset_index()
monthly_customers = pd.concat([monthly_customers] + [monthly_customers["Cust-ID"].shift(i) for i in range(1, 3)], axis ="columns")
monthly_customers.columns = ["Year", "Month"] + [ f"shift_{i}" for i in range(3) ]
def count_unique(row):
total_set = frozenset()
columns = [ f"shift_{i}" for i in range(3) ]
for col in columns:
if row.get(col) and type(row.get(col)) == frozenset:
total_set = total_set | row.get(col)
return len(total_set)
monthly_customers["N_month_count"] = monthly_customers.apply(count_unique, axis=1)
monthly_customers
如果您没有所有月份的数据,则需要填写缺失的月份。