计算客户之间共享了多少订单

Count how many orders were shared between customers

我有一个有两列的 table

Order | CustomerID

 1. A | C1 
 2. B | C1 
 3. C | C1 
 4. D | C2 
 5. B | C3 
 6. C | C3
 7. D | C4

很长table。我想要一个显示

的输出
C1 | C3 | 2  #Customer C1 and Customer C3 share 2 orders (i.e. orders, B & C) 
C1 | C2 | 0   #Customer C1 and Customer C2 share 0 orders 
C2 | C4 | 1   #Customer C2 and Customer C4 share 1 orders (i.e. order, D)
C2 | C3 | 0  Customer C2 and Customer C3 share 0 orders  

一个SQL服务器演示(但代码是通用的):

; with data as (select 'A' as [Order], 'C1' as CustomerID 
                union all 
                select 'B', 'C1'
                union all 
                select 'C', 'C1'
                union all 
                select 'D', 'C2'
                union all 
                select 'B', 'C3'
                union all 
                select 'C', 'C3'
                union all 
                select 'D', 'C4'
        )
select c1, c2, count(*) from (
select x.[Order], x.CustomerID c1, y.CustomerID c2
from data x join data y on x.[Order] = y.[Order] and x.CustomerID < y.CustomerID
) temp
group by c1, c2

这仅考虑了共享至少一个顺序的配对。我认为不共享任何订单的退货对会浪费资源。

select 
    a.CustomerId
  , b.CustomerId
  , sum(case when a.[Order] = b.[Order] then 1 else 0 end) as SharedOrders
from t as a
  inner join t as b
    on a.CustomerId < b.CustomerId
group by a.CustomerId, b.CustomerId

测试设置:http://rextester.com/ISSCL35174

returns:

+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1         | C2         |            0 |
| C1         | C3         |            2 |
| C2         | C3         |            0 |
| C1         | C4         |            0 |
| C2         | C4         |            1 |
| C3         | C4         |            0 |
+------------+------------+--------------+

仅 return 个共享订单:

select a.CustomerId
     , b.CustomerId
     , count(*) as SharedOrders
from t as a
  inner join t as b
    on a.CustomerId < b.CustomerId
   and a.[Order] = b.[Order]
group by a.CustomerId, b.CustomerId

returns:

+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1         | C3         |            2 |
| C2         | C4         |            1 |
+------------+------------+--------------+

这是使用 tablecrossprodcombn 和矩阵子集的基础 R 方法。

# get counts of customer IDs
myMat <- crossprod(with(df, table(Order, CustomerID)))
myMat
          CustomerID
CustomerID C1 C2 C3 C4
        C1  3  0  2  0
        C2  0  1  0  1
        C3  2  0  2  0
        C4  0  1  0  1

注意这里的对角线是每个客户的订单总数,(对称的)非对角线是每个客户共享的订单数。

# get all customer pairs
customers <- t(combn(rownames(myMat), 2))

# use matrix subsetting to pull out order counts and cbind.data.frame to put it together
cbind.data.frame(customers, myMat[customers])
   1  2 myMat[customers]
1 C1 C2                0
2 C1 C3                2
3 C1 C4                0
4 C2 C3                0
5 C2 C4                1
6 C3 C4                0

如果需要,您可以将其包装在 setNames 中以添加名称以提供特定的变量名称

setNames(cbind.data.frame(customers, myMat[customers]), c("c1", "c2", "counts"))

数据

df <- 
structure(list(Order = c("A", "B", "C", "D", "B", "C", "D"), 
    CustomerID = c("C1", "C1", "C1", "C2", "C3", "C3", "C4")), .Names = c("Order", 
"CustomerID"), class = "data.frame", row.names = c(NA, -7L))

我会使用 cross join 来获得所有成对的客户,然后 left joins 来引入订单。最后一步是聚合:

select c1.CustomerId, c2.CustomerId, count(t2.Order) as inCommon
from (select distinct CustomerID from t) c1 cross join
     (select distinct CustomerID from t) c2 left join
     t t1
     on t1.CustomerId = c1.CustomerId left join
     t t2
     on t2.CustomerId = c2.CustomerId and
        t2.Order = t1.Order
where c1.CustomerId < c2.CustomerId
group by c1.CustomerId, c2.CustomerId;

这个过程有点棘手,因为您需要没有共同订单的配对。