查找客户未购买的产品的平均评分

Finding average ratings of a product that customers have not bought

我正在尝试为每个客户查找他们未购买的产品,以及这些产品中哪些产品的评分最高。

例如,在下面的 table 中,约翰购买了商品 1 和 2,但没有购买商品 3、4 或 5。商品 3 和 5 没有评级,因此不会被包括在内,但应该包括产品 4,因为它是约翰没有购买的评价最高的商品。

这是我的 table 结构和一些示例数据:

顾客

id  | customer
----|---------
1   | john
2   | jenkins
3   | jane
4   | janet

产品

id  | description
----|---------
1   | deoderant
2   | soap
3   | shampoo
4   | razor
5   | sponge

订单

customer_id  | product_id
-------------|---------
1            | 1
1            | 2
2            | 3
2            | 4
3            | 5

评分

customer_id  | product_id | rate
-------------|------------|-------
1            | 1          | 3
2            | 2          | 2
2            | 4          | 3
4            | 2          | 4

如果您想为一位客户执行此操作,只需使用 order bylimit:

select c.*, r.*
from customers c cross join
     (select r.product_id, avg(rating) avgr
      from rating r
      group by r.product_id
     ) r left join
     orders o
     on o.customer_id = c.customer_id and
        o.product_id = r.product_id 
where c.customer_id = @customerid and o.product_id is null
order by r.avgr desc
limit 1;

如果你想一次给所有客户,那就有点复杂了。一种方法是使用 substring_index()/group_concat() 技巧:

select c.*,
       substring_index(group_concat(r.product_id order by avgr desc), ',', 1) as product_id
from customers c cross join
     (select r.product_id, avg(rating) avgr
      from rating r
      group by r.product_id
     ) r left join
     orders o
     on o.customer_id = c.customer_id and
        o.product_id = r.product_id 
where c.customer_id = @customerid and o.product_id is null
group by c.customer_id;

我先写了几个子查询,然后再将它们拼凑在一起。我的个人建议是在寻求整个解决方案之前,始终将问题分解成更小的部分。

例如,我需要知道的一件事是每个客户还没有购买的所有产品。我这样做是通过交叉连接客户和产品 table(以获得所有配对)并删除订单 table 中已经存在的对,如下所示:

-- Get all customer/product pairings where customer_product
-- does not exist in orders table
SELECT c.id, p.id
FROM customer c
CROSS JOIN product p
WHERE (c.id, p.id) NOT IN (SELECT * FROM orders)
ORDER BY c.id;

我还写了一个子查询来获取每个产品的平均评分。如果产品没有评级,此查询将 return 为空:

SELECT p.id, AVG(r.rate) AS averageRating
FROM product p
LEFT JOIN rate r ON r.product_id = p.id
GROUP BY p.id;

现在,我可以将这两个作为子查询和 select 客户 ID、产品 ID 以及他们尚未购买的每个产品的评级包括在内:

SELECT t1.customerID, t1.productID, t2.averageRating
FROM(
  SELECT c.id AS customerID, p.id AS productID
  FROM customer c
  CROSS JOIN product p
  WHERE (c.id, p.id) NOT IN (SELECT * FROM orders)
  ORDER BY c.id) t1
JOIN(
  SELECT p.id AS productID, AVG(r.rate) AS averageRating
  FROM product p
  LEFT JOIN rate r ON r.product_id = p.id
  GROUP BY p.id) t2 ON t2.productID = t1.productID;

那是最难的部分。剩下要做的唯一一件事就是进行一些聚合,以从每个客户未购买的商品中获得最大评分,然后在最大评分与平均评分相匹配的情况下将该聚合查询与上面的查询连接起来。所以,这是我放在一起的可怕查询:

SELECT t1.customerID, t1.productID, t1.averageRating
FROM(
  SELECT t1.customerID, t1.productID, t2.averageRating
  FROM(
    SELECT c.id AS customerID, p.id AS productID
    FROM customer c
    CROSS JOIN product p
    WHERE (c.id, p.id) NOT IN (SELECT * FROM orders)
    ORDER BY c.id) t1
  JOIN(
    SELECT p.id AS productID, AVG(r.rate) AS averageRating
    FROM product p
    LEFT JOIN rate r ON r.product_id = p.id
    GROUP BY p.id) t2 ON t2.productID = t1.productID) t1
JOIN(
  SELECT t1.customerID, MAX(t2.averageRating) AS maxRating
  FROM(
    SELECT c.id AS customerID, p.id AS productID
    FROM customer c
    CROSS JOIN product p
    WHERE (c.id, p.id) NOT IN (SELECT * FROM orders)
    ORDER BY c.id) t1
  JOIN(
    SELECT p.id AS productID, AVG(r.rate) AS averageRating
    FROM product p
    LEFT JOIN rate r ON r.product_id = p.id
    GROUP BY p.id) t2 ON t2.productID = t1.productID
  GROUP BY t1.customerID) t2 ON t2.customerID = t1.customerID AND t2.maxRating = t1.averageRating
ORDER BY t1.customerID;

这里是 MySQL workbench 结果的快照:

需要注意的重要一点是我没有消除关系。因此,例如,客户 2 没有购买产品 1 或 2,并且它们具有相同的评级,因此两行是 returned。

我在 MySQL 中进行了测试,因为 SQL Fiddle 没有工作,但我让它工作了,所以这里有一个 Fiddle 示例,如果您愿意的话。