3列之间的统计模式

Statistical Mode between 3 columns

我有一个 ~70K-entry table 订单,如下:

我想根据每个客户确定最常见的订单是什么,以及该订单的确定性(样本大小和概率)。

这是我目前拥有的:

CREATE VIEW CustomerOrderProbabaility as 
SELECT Distinct(customerID)
        customerID,
        order,
        COUNT(*) as sampleSize
FROM (Select customerID, order1 AS order FROM orderTable UNION
      Select customerID, order2 AS order FROM orderTable UNION
      Select customerID, order3 AS order FROM orderTable
     )
GROUP BY customerID, order
ORDER BY customerID, COUNT(*) DESC;

我得到 table 个 customerIdorder,但 sampleSize 总是 1。我哪里错了?

我想你想要 UNION ALL 以及其他一些更改:

CREATE VIEW CustomerOrderProbabaility as 
    SELECT DISTINCT ON (customerID)
            customerID,
            order,
            COUNT(*) as sampleSize,
            SUM(COUNT(*)) OVER (PARTITION BY customerId) as totOrders
    FROM (Select customerID, order1 AS theorder FROM orderTable UNION ALL
          Select customerID, order2 AS theorder FROM orderTable UNION ALL
          Select customerID, order3 AS theorder FROM orderTable
         ) co
    GROUP BY customerID, theorder
    ORDER BY customerID, COUNT(*) DESC;

UNION 删除重复项。

变化:

  • 已将 order 重命名为 theorderorder 是关键字。即使接受为专栏名称,我也不认为这是个好主意。
  • UNION ALL 而不是 UNION,因此不会删除重复项。
  • DISTINCT ON 而不是 DISTINCT,因为这是你的意图。
  • 添加了 TotOrders 来计算每个客户的所有订单。