PostgreSQL 中的 WHERE 和 FILTER (WHERE) 有什么区别?

What's the difference between WHERE and FILTER (WHERE) in PostgreSQL?

总结与问题

我最近在 SO 上问了一个 ,得到了几个不同的答案,这些答案帮助我得到了我想要的结果。他们都工作了,得到了正确的结果,并且都有非常相似的代码。不过有一行是不同的:一个答案使用 WHERE 子句来挑选我要求的内容,而另一个使用 FILTER (WHERE ...)。这两者有什么区别?

背景细节

出于好奇,我问的table是这个:

+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1        |0      |
|a |1        |1      |
|b |0        |1      |
|c |1        |0      |
|c |0        |1      |
|c |1        |1      |
+--+---------+-------+

我想要这样的东西:

+-----------------------+-----+
|ever treated           |count|
+-----------------------+-----+
|0                      |1    |
|1                      |3    |
+-----------------------+-----+

我得到了两个有效的答案。这是来自@ErwinBrandstetter 的第一个:

SELECT ever_treated, sum(outcome_ct) AS count
FROM  (
   SELECT id, 
          max(treatment) AS ever_treated, 
          count(*) FILTER (WHERE outcome = 1) AS outcome_ct
   FROM t
   GROUP  BY 1
   ) sub
GROUP  BY 1;

第二个是@Heidiki:

    select subq.ever_treated, sum(subq.count) as count
    from (select id, 
          max(treatment) as ever_treated, 
          count(*) as count from t where outcome = 1 
          group by id) as subq 
    group by subq.ever_treated;

在我(无可否认的新手)看来,两者之间的主要区别在于,在前者中,您会看到:

count(*) FILTER (WHERE outcome = 1) AS outcome_ct

而在后者中,你有这个:

count(*) as count from t where outcome = 1

我正在查看一些文档,我发现 FILTER 在聚合级别上工作,而也许 WHERE 不是,但我仍然迷失了直觉,特别是因为它适用于我的 table.

那么这里有什么区别呢?

子查询不等价:

第一个

SELECT
  id, 
  max(treatment) AS ever_treated, 
  count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
  1. 读取 t 中的所有行。
  2. 计算 整个 table t.
  3. 中每个 id 的最大值 treatment
  4. 计算每个 id 的行数 outcome = 1

第二个

select
  id, 
  max(treatment) as ever_treated, 
  count(*) as count 
from t
where outcome = 1 
group by id
  1. t 中读取行的子集,其中 outcome = 1
  2. 为 table t.[=42= 的 子集中的每个 id 计算最大值 treatment ]
  3. 计算子集中每个 id 的行数(已过滤)。

就结果而言,只有#2 和#3 有意义。如您所见,#3 是等价的,但#2 不是。