计算客户之间共享了多少订单
Count how many orders were shared between customers
我有一个有两列的 table
Order | CustomerID
1. A | C1
2. B | C1
3. C | C1
4. D | C2
5. B | C3
6. C | C3
7. D | C4
很长table。我想要一个显示
的输出
C1 | C3 | 2 #Customer C1 and Customer C3 share 2 orders (i.e. orders, B & C)
C1 | C2 | 0 #Customer C1 and Customer C2 share 0 orders
C2 | C4 | 1 #Customer C2 and Customer C4 share 1 orders (i.e. order, D)
C2 | C3 | 0 Customer C2 and Customer C3 share 0 orders
一个SQL服务器演示(但代码是通用的):
; with data as (select 'A' as [Order], 'C1' as CustomerID
union all
select 'B', 'C1'
union all
select 'C', 'C1'
union all
select 'D', 'C2'
union all
select 'B', 'C3'
union all
select 'C', 'C3'
union all
select 'D', 'C4'
)
select c1, c2, count(*) from (
select x.[Order], x.CustomerID c1, y.CustomerID c2
from data x join data y on x.[Order] = y.[Order] and x.CustomerID < y.CustomerID
) temp
group by c1, c2
这仅考虑了共享至少一个顺序的配对。我认为不共享任何订单的退货对会浪费资源。
select
a.CustomerId
, b.CustomerId
, sum(case when a.[Order] = b.[Order] then 1 else 0 end) as SharedOrders
from t as a
inner join t as b
on a.CustomerId < b.CustomerId
group by a.CustomerId, b.CustomerId
测试设置:http://rextester.com/ISSCL35174
returns:
+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1 | C2 | 0 |
| C1 | C3 | 2 |
| C2 | C3 | 0 |
| C1 | C4 | 0 |
| C2 | C4 | 1 |
| C3 | C4 | 0 |
+------------+------------+--------------+
仅 return 个共享订单:
select a.CustomerId
, b.CustomerId
, count(*) as SharedOrders
from t as a
inner join t as b
on a.CustomerId < b.CustomerId
and a.[Order] = b.[Order]
group by a.CustomerId, b.CustomerId
returns:
+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1 | C3 | 2 |
| C2 | C4 | 1 |
+------------+------------+--------------+
这是使用 table
、crossprod
、combn
和矩阵子集的基础 R 方法。
# get counts of customer IDs
myMat <- crossprod(with(df, table(Order, CustomerID)))
myMat
CustomerID
CustomerID C1 C2 C3 C4
C1 3 0 2 0
C2 0 1 0 1
C3 2 0 2 0
C4 0 1 0 1
注意这里的对角线是每个客户的订单总数,(对称的)非对角线是每个客户共享的订单数。
# get all customer pairs
customers <- t(combn(rownames(myMat), 2))
# use matrix subsetting to pull out order counts and cbind.data.frame to put it together
cbind.data.frame(customers, myMat[customers])
1 2 myMat[customers]
1 C1 C2 0
2 C1 C3 2
3 C1 C4 0
4 C2 C3 0
5 C2 C4 1
6 C3 C4 0
如果需要,您可以将其包装在 setNames
中以添加名称以提供特定的变量名称
setNames(cbind.data.frame(customers, myMat[customers]), c("c1", "c2", "counts"))
数据
df <-
structure(list(Order = c("A", "B", "C", "D", "B", "C", "D"),
CustomerID = c("C1", "C1", "C1", "C2", "C3", "C3", "C4")), .Names = c("Order",
"CustomerID"), class = "data.frame", row.names = c(NA, -7L))
我会使用 cross join
来获得所有成对的客户,然后 left join
s 来引入订单。最后一步是聚合:
select c1.CustomerId, c2.CustomerId, count(t2.Order) as inCommon
from (select distinct CustomerID from t) c1 cross join
(select distinct CustomerID from t) c2 left join
t t1
on t1.CustomerId = c1.CustomerId left join
t t2
on t2.CustomerId = c2.CustomerId and
t2.Order = t1.Order
where c1.CustomerId < c2.CustomerId
group by c1.CustomerId, c2.CustomerId;
这个过程有点棘手,因为您需要没有共同订单的配对。
我有一个有两列的 table
Order | CustomerID
1. A | C1
2. B | C1
3. C | C1
4. D | C2
5. B | C3
6. C | C3
7. D | C4
很长table。我想要一个显示
的输出C1 | C3 | 2 #Customer C1 and Customer C3 share 2 orders (i.e. orders, B & C)
C1 | C2 | 0 #Customer C1 and Customer C2 share 0 orders
C2 | C4 | 1 #Customer C2 and Customer C4 share 1 orders (i.e. order, D)
C2 | C3 | 0 Customer C2 and Customer C3 share 0 orders
一个SQL服务器演示(但代码是通用的):
; with data as (select 'A' as [Order], 'C1' as CustomerID
union all
select 'B', 'C1'
union all
select 'C', 'C1'
union all
select 'D', 'C2'
union all
select 'B', 'C3'
union all
select 'C', 'C3'
union all
select 'D', 'C4'
)
select c1, c2, count(*) from (
select x.[Order], x.CustomerID c1, y.CustomerID c2
from data x join data y on x.[Order] = y.[Order] and x.CustomerID < y.CustomerID
) temp
group by c1, c2
这仅考虑了共享至少一个顺序的配对。我认为不共享任何订单的退货对会浪费资源。
select
a.CustomerId
, b.CustomerId
, sum(case when a.[Order] = b.[Order] then 1 else 0 end) as SharedOrders
from t as a
inner join t as b
on a.CustomerId < b.CustomerId
group by a.CustomerId, b.CustomerId
测试设置:http://rextester.com/ISSCL35174
returns:
+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1 | C2 | 0 |
| C1 | C3 | 2 |
| C2 | C3 | 0 |
| C1 | C4 | 0 |
| C2 | C4 | 1 |
| C3 | C4 | 0 |
+------------+------------+--------------+
仅 return 个共享订单:
select a.CustomerId
, b.CustomerId
, count(*) as SharedOrders
from t as a
inner join t as b
on a.CustomerId < b.CustomerId
and a.[Order] = b.[Order]
group by a.CustomerId, b.CustomerId
returns:
+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1 | C3 | 2 |
| C2 | C4 | 1 |
+------------+------------+--------------+
这是使用 table
、crossprod
、combn
和矩阵子集的基础 R 方法。
# get counts of customer IDs
myMat <- crossprod(with(df, table(Order, CustomerID)))
myMat
CustomerID
CustomerID C1 C2 C3 C4
C1 3 0 2 0
C2 0 1 0 1
C3 2 0 2 0
C4 0 1 0 1
注意这里的对角线是每个客户的订单总数,(对称的)非对角线是每个客户共享的订单数。
# get all customer pairs
customers <- t(combn(rownames(myMat), 2))
# use matrix subsetting to pull out order counts and cbind.data.frame to put it together
cbind.data.frame(customers, myMat[customers])
1 2 myMat[customers]
1 C1 C2 0
2 C1 C3 2
3 C1 C4 0
4 C2 C3 0
5 C2 C4 1
6 C3 C4 0
如果需要,您可以将其包装在 setNames
中以添加名称以提供特定的变量名称
setNames(cbind.data.frame(customers, myMat[customers]), c("c1", "c2", "counts"))
数据
df <-
structure(list(Order = c("A", "B", "C", "D", "B", "C", "D"),
CustomerID = c("C1", "C1", "C1", "C2", "C3", "C3", "C4")), .Names = c("Order",
"CustomerID"), class = "data.frame", row.names = c(NA, -7L))
我会使用 cross join
来获得所有成对的客户,然后 left join
s 来引入订单。最后一步是聚合:
select c1.CustomerId, c2.CustomerId, count(t2.Order) as inCommon
from (select distinct CustomerID from t) c1 cross join
(select distinct CustomerID from t) c2 left join
t t1
on t1.CustomerId = c1.CustomerId left join
t t2
on t2.CustomerId = c2.CustomerId and
t2.Order = t1.Order
where c1.CustomerId < c2.CustomerId
group by c1.CustomerId, c2.CustomerId;
这个过程有点棘手,因为您需要没有共同订单的配对。