以多种不同方式聚合同一列

Question

我正在尝试获取与每个产品关联的类别数组，然后还在另一列中获取每个产品的顶级父类别，根据我的逻辑，这会为类别数组找到相同的值，但是只选择 where parent_id is NULL 应该只拉回一个值和每个 ID 1 条记录。

我真的不知道构建此查询的最佳方式。我有一些工作，但它还在父类别列中显示具有父 ID 的类别的 NULL 值，并为每个产品制作第二条记录，因为我被迫将它放在分组依据中。基本上，我认为我没有以正确或最有效的方式这样做。

想要的结果：

+----+----------------+------------------+------------------------------------------------+------------------+
| id | name           | category_ids     | category_names                                 | parent_category  |
+----+----------------+------------------+------------------------------------------------+------------------+
| 1  | Product Name 1 | {111,222,333}    | {Electronics, computers, computer accessories} | Electronics      |
+----+----------------+------------------+------------------------------------------------+------------------+

我当前的查询（不理想）：

select p.id, 
p.name, 
array_agg(category_id) as category_ids,
regexp_replace(array_agg(c.name)::text,'"|''','','gi') as category_names,
c1.name as parent_category
from products p
join product_categorizations pc  on pc.product_id = p.id
join categories c  on pc.category_id = c.id
full outer join (
   select name, id from categories
   where parent_id is null and name is not null
   ) c1 on c.id = c1.id
group by 1,2,5;

+----+----------------+------------------+-----------------------------------+------------------+
| id | name           | category_ids     | category_names                    | parent_category  |
+----+----------------+------------------+-----------------------------------+------------------+
| 1  | Product Name 1 | {111}            | {Electronics}                     | Electronics      |
+----+----------------+------------------+-----------------------------------+------------------+
| 1  | Product Name 1 | {222,333}        | {computers, computer accessories} | NULL             |
+----+----------------+------------------+-----------------------------------+------------------+

Answer 1

将 FULL JOIN 替换为聚合 FILTER 子句：

SELECT p.id
     , p.name
     , array_agg(pc.category_id) AS category_ids
     , string_agg(c.name, ', ')  AS category_names  -- regexp_replace .. ?
<b>     , min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category</b>
FROM   products                p
JOIN   product_categorizations pc ON pc.product_id = p.id
JOIN   categories              c  ON pc.category_id = c.id
GROUP  BY p.id;

参见：

Aggregate columns with additional (distinct) filters

（为什么要添加 AND name IS NOT NULL？无论如何，min() 都会忽略 NULL 值。）

聚合所有产品，并强制执行参照完整性时，这应该会快一点：

SELECT p.name, pc.*
FROM   products p
JOIN  (
   SELECT pc.product_id AS id
        , array_agg(pc.category_id) AS category_ids
        , string_agg(c.name, ', ')  AS category_names
        , min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
   FROM   product_categorizations pc
   JOIN   categories              c  ON pc.category_id = c.id
   GROUP  BY 1
   ) pc  USING (id);

重点是 product 仅在聚合行后加入。

旁白："name" 不是一个非常有用的列名。相关：

How to implement a many-to-many relationship in PostgreSQL?

以多种不同方式聚合同一列

Aggregate the same column in multiple different ways

sql

postgresql

aggregate-functions

aggregate-filter