产品之间的关联或交叉销售百分比

Association between products or cross sell %

我有一个 table 喜欢

ID     ProductBought
1       A
1       B
1       C
2       A
1       B
2       C
3       B
3       C
2       D
3       D
4       A
4       B 
4       C

我打算计算: 买了 A,也买了 B - 2 个案例(ID 1 和 4),总共 = 2/3 个 ID(3 个 ID 买了 A,其中 2 个买了 B)

我知道这与关联 rules/apriori 有关,但我希望对所有可能的产品组合进行总体汇总 numbers/calculations,下面是 table 输出类型的说明:

Category  Total distinct customer( in LHS )     % cross sell
A to B        3                                     66% 
A to C        3                                     66 % 
B to C        3                                     100 %

必须有一个 better/cleaner 方法,但这里使用 dplyr:

library(dplyr)

df1 %>% 
  group_by(ProductBought) %>% 
  mutate(distinctCustomerN = n_distinct(ID)) %>% 
  ungroup() %>% 
  left_join(df1, by = "ID") %>% 
  filter(ProductBought.x != ProductBought.y) %>% 
  group_by(ProductBought.x, ProductBought.y, distinctCustomerN) %>% 
  summarise(n = n_distinct(ID)) %>% 
  mutate(n_pc = n/distinctCustomerN * 100)

#    ProductBought.x ProductBought.y distinctCustomerN     n      n_pc
#             <fctr>          <fctr>             <int> <int>     <dbl>
# 1                A               B                 3     2  66.66667
# 2                A               C                 3     3 100.00000
# 3                A               D                 3     1  33.33333
# 4                B               A                 3     2  66.66667
# 5                B               C                 3     3 100.00000
# 6                B               D                 3     1  33.33333
# 7                C               A                 4     3  75.00000
# 8                C               B                 4     3  75.00000
# 9                C               D                 4     2  50.00000
# 10               D               A                 2     1  50.00000
# 11               D               B                 2     1  50.00000
# 12               D               C                 2     2 100.00000