结合两个查询以计算具有不同过滤器的不同字符串

Combine two queries to count distinct strings with different filters

我正在尝试编写一个 Postgres 查询,该查询将导致 table 如下所示:

source |yes_gap|no_gap|
-------|-------|------|
allivet|     29|    25|
amazon |    692|   255|

我已经能够编写两个单独的查询,但一直无法弄清楚如何将它们组合成一个。

这是我对 product_gap='yes' 的查询:

select 
source,
count(distinct(sku)) as yes_gap
from product_gaps where 
product_gap='yes' and
ingestion_date <= '2021-05-25' 
/* aggregate by source */
group by source

结果:

source |yes_gap|
-------|-------|
allivet|     29|
amazon |    692|

这是我对 product_gap='no' 的查询:

select 
source,
count(distinct(sku)) as no_gap
from product_gaps where 
product_gap='no' and
ingestion_date <= '2021-05-25' 
/* aggregate by source */
group by source

结果:

source |no_gap|
-------|------|
allivet|    25|
amazon |   255|

我可以在一个查询中获得两个计数吗?

你已经完成了 95% 的工作,剩下的就是加入 2 个来源

 SELECT source, yes_gap,no_gap FROM 
       ( select 
    source,
    count(distinct(sku)) as yes_gap
    from product_gaps where 
    product_gap='yes' and
    ingestion_date <= '2021-05-25' 
    /* aggregate by source */
    group by source ) r1
       FULL OUTER JOIN
       ( select 
    source,
    count(distinct(sku)) as no_gap
    from product_gaps where 
    product_gap='no' and
    ingestion_date <= '2021-05-25' 
    /* aggregate by source */
    group by source ) r2
       USING ( source ) 
       ORDER BY source;

使用聚合 FILTER 子句的条件聚合更快更简单:

SELECT source
     , count(DISTINCT sku) FILTER (WHERE product_gap = 'yes') AS yes_gap
     , count(DISTINCT sku) FILTER (WHERE product_gap = 'no')  AS no_gap
FROM   product_gaps
WHERE  ingestion_date <= '2021-05-25'
GROUP  BY source;

参见:

  • Aggregate columns with additional (distinct) filters

旁白 1:DISTINCT 是关键字,不是函数。不要为单列添加括号。 distinct(sku)DISTINCT ROW(sku) 的缩写。它恰好有效,因为 Postgres 去除了单个列的 ROW 包装器,但这只是噪音。

旁白 2:product_gap 应该是 boolean