结合两个查询以计算具有不同过滤器的不同字符串
Combine two queries to count distinct strings with different filters
我正在尝试编写一个 Postgres 查询,该查询将导致 table 如下所示:
source |yes_gap|no_gap|
-------|-------|------|
allivet| 29| 25|
amazon | 692| 255|
我已经能够编写两个单独的查询,但一直无法弄清楚如何将它们组合成一个。
这是我对 product_gap='yes'
的查询:
select
source,
count(distinct(sku)) as yes_gap
from product_gaps where
product_gap='yes' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source
结果:
source |yes_gap|
-------|-------|
allivet| 29|
amazon | 692|
这是我对 product_gap='no'
的查询:
select
source,
count(distinct(sku)) as no_gap
from product_gaps where
product_gap='no' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source
结果:
source |no_gap|
-------|------|
allivet| 25|
amazon | 255|
我可以在一个查询中获得两个计数吗?
你已经完成了 95% 的工作,剩下的就是加入 2 个来源
SELECT source, yes_gap,no_gap FROM
( select
source,
count(distinct(sku)) as yes_gap
from product_gaps where
product_gap='yes' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source ) r1
FULL OUTER JOIN
( select
source,
count(distinct(sku)) as no_gap
from product_gaps where
product_gap='no' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source ) r2
USING ( source )
ORDER BY source;
使用聚合 FILTER
子句的条件聚合更快更简单:
SELECT source
, count(DISTINCT sku) FILTER (WHERE product_gap = 'yes') AS yes_gap
, count(DISTINCT sku) FILTER (WHERE product_gap = 'no') AS no_gap
FROM product_gaps
WHERE ingestion_date <= '2021-05-25'
GROUP BY source;
参见:
- Aggregate columns with additional (distinct) filters
旁白 1:DISTINCT
是关键字,不是函数。不要为单列添加括号。 distinct(sku)
是 DISTINCT ROW(sku)
的缩写。它恰好有效,因为 Postgres 去除了单个列的 ROW 包装器,但这只是噪音。
旁白 2:product_gap
应该是 boolean
。
我正在尝试编写一个 Postgres 查询,该查询将导致 table 如下所示:
source |yes_gap|no_gap|
-------|-------|------|
allivet| 29| 25|
amazon | 692| 255|
我已经能够编写两个单独的查询,但一直无法弄清楚如何将它们组合成一个。
这是我对 product_gap='yes'
的查询:
select
source,
count(distinct(sku)) as yes_gap
from product_gaps where
product_gap='yes' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source
结果:
source |yes_gap|
-------|-------|
allivet| 29|
amazon | 692|
这是我对 product_gap='no'
的查询:
select
source,
count(distinct(sku)) as no_gap
from product_gaps where
product_gap='no' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source
结果:
source |no_gap|
-------|------|
allivet| 25|
amazon | 255|
我可以在一个查询中获得两个计数吗?
你已经完成了 95% 的工作,剩下的就是加入 2 个来源
SELECT source, yes_gap,no_gap FROM
( select
source,
count(distinct(sku)) as yes_gap
from product_gaps where
product_gap='yes' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source ) r1
FULL OUTER JOIN
( select
source,
count(distinct(sku)) as no_gap
from product_gaps where
product_gap='no' and
ingestion_date <= '2021-05-25'
/* aggregate by source */
group by source ) r2
USING ( source )
ORDER BY source;
使用聚合 FILTER
子句的条件聚合更快更简单:
SELECT source
, count(DISTINCT sku) FILTER (WHERE product_gap = 'yes') AS yes_gap
, count(DISTINCT sku) FILTER (WHERE product_gap = 'no') AS no_gap
FROM product_gaps
WHERE ingestion_date <= '2021-05-25'
GROUP BY source;
参见:
- Aggregate columns with additional (distinct) filters
旁白 1:DISTINCT
是关键字,不是函数。不要为单列添加括号。 distinct(sku)
是 DISTINCT ROW(sku)
的缩写。它恰好有效,因为 Postgres 去除了单个列的 ROW 包装器,但这只是噪音。
旁白 2:product_gap
应该是 boolean
。