以多种不同方式聚合同一列

Aggregate the same column in multiple different ways

我正在尝试获取与每个产品关联的类别数组,然后还在另一列中获取每个产品的顶级父类别,根据我的逻辑,这会为类别数组找到相同的值,但是只选择 where parent_id is NULL 应该只拉回一个值和每个 ID 1 条记录。

我真的不知道构建此查询的最佳方式。我有一些工作,但它还在父类别列中显示具有父 ID 的类别的 NULL 值,并为每个产品制作第二条记录,因为我被迫将它放在分组依据中。基本上,我认为我没有以正确或最有效的方式这样做。

想要的结果:

+----+----------------+------------------+------------------------------------------------+------------------+
| id | name           | category_ids     | category_names                                 | parent_category  |
+----+----------------+------------------+------------------------------------------------+------------------+
| 1  | Product Name 1 | {111,222,333}    | {Electronics, computers, computer accessories} | Electronics      |
+----+----------------+------------------+------------------------------------------------+------------------+

我当前的查询(不理想):

select p.id, 
p.name, 
array_agg(category_id) as category_ids,
regexp_replace(array_agg(c.name)::text,'"|''','','gi') as category_names,
c1.name as parent_category
from products p
join product_categorizations pc  on pc.product_id = p.id
join categories c  on pc.category_id = c.id
full outer join (
   select name, id from categories
   where parent_id is null and name is not null
   ) c1 on c.id = c1.id
group by 1,2,5;
+----+----------------+------------------+-----------------------------------+------------------+
| id | name           | category_ids     | category_names                    | parent_category  |
+----+----------------+------------------+-----------------------------------+------------------+
| 1  | Product Name 1 | {111}            | {Electronics}                     | Electronics      |
+----+----------------+------------------+-----------------------------------+------------------+
| 1  | Product Name 1 | {222,333}        | {computers, computer accessories} | NULL             |
+----+----------------+------------------+-----------------------------------+------------------+

FULL JOIN 替换为聚合 FILTER 子句:

SELECT p.id
     , p.name
     , array_agg(pc.category_id) AS category_ids
     , string_agg(c.name, ', ')  AS category_names  -- regexp_replace .. ?
<b>     , min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category</b>
FROM   products                p
JOIN   product_categorizations pc ON pc.product_id = p.id
JOIN   categories              c  ON pc.category_id = c.id
GROUP  BY p.id;

参见:

  • Aggregate columns with additional (distinct) filters

(为什么要添加 AND name IS NOT NULL?无论如何,min() 都会忽略 NULL 值。)

聚合 所有 产品,并强制执行参照完整性时,这应该会快一点:

SELECT p.name, pc.*
FROM   products p
JOIN  (
   SELECT pc.product_id AS id
        , array_agg(pc.category_id) AS category_ids
        , string_agg(c.name, ', ')  AS category_names
        , min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
   FROM   product_categorizations pc
   JOIN   categories              c  ON pc.category_id = c.id
   GROUP  BY 1
   ) pc  USING (id);

重点是 product 仅在 聚合行后加入

旁白:"name" 不是一个非常有用的列名。相关:

  • How to implement a many-to-many relationship in PostgreSQL?