PostgreSQL：如何在非聚合列上 select？

Question

这似乎是一个简单的问题，但我无法完成它。我想要做的是 return 所有具有重复 ID 的名称。视图如下所示：

id |  name  | other_col
---+--------+----------
 1 | James  |    x
 2 | John   |    x
 2 | David  |    x
 3 | Emily  |    x
 4 | Cameron|    x
 4 | Thomas |    x

所以在这种情况下，我只想要结果：

name
-------
John
David
Cameron
Thomas

以下查询有效，但有两个单独的选择似乎有点矫枉过正：

select name 
from view where id = ANY(select id from view 
                         WHERE other_col='x' 
                         group by id 
                         having count(id) > 1) 
      and other_col='x';

我相信应该可以按照以下方式做一些事情：

select name from view WHERE other_col='x' group by id, name having count(id) > 1;

但这return根本算不了什么！ 'proper' 查询是什么？

我是必须像我的第一个工作建议那样做还是有更好的方法？

Answer 1

SELECT name FROM Table
WHERE id IN (SELECT id, COUNT(*) FROM Table GROUP BY id HAVING COUNT(*)>1) Temp

Answer 2

使用 EXIST 运算符

SELECT * FROM table t1
WHERE EXISTS(
  SELECT null FROM table t2
  WHERE t1.id = t2.id 
    AND t1.name <> t2.name
)

Answer 3

使用联接：

select distinct name 
from view v1
join view v2 on v1.id = v2.id
  and v1.name != v2.name

使用 distinct 是为了防止超过 2 行共享相同的 id。如果那不可能，您可以省略 distinct.

注意：将不唯一的列命名为 id 可能会引起混淆，因为这是唯一标识符列的行业标准。如果根本没有唯一列，就会造成编码困难。

Answer 4

你说你想避免两个 "queries"，这实际上是不可能的。有很多可用的解决方案，但我会像这样使用 CTE：

WITH cte AS
(
SELECT
    id,
    name,
    other_col,
    COUNT(name) OVER(PARTITION BY id) AS id_count
FROM
    table
)

SELECT name FROM cte WHERE id_count > 1;

您可以重用 CTE，因此您不必重复逻辑，我个人觉得它更容易阅读和理解它在做什么。

Answer 5

不要使用 CTE。这通常更昂贵，因为 Postgres 必须具体化中间结果。

EXISTS 半连接通常是最快的。只需确保重复谓词（或匹配值）：

SELECT name 
FROM   view v
WHERE  other_col = 'x'
AND    EXISTS (
   SELECT 1 FROM view 
   WHERE  other_col = 'x' -- or: other_col = v.other_col
   AND    id <> v.id      -- exclude join to self
   );

这是一个单一查询，即使您在这里两次看到关键字 SELECT 也是如此。 EXISTS 表达式不会产生派生的 table，它将被解析为简单的索引查找。

说到：(other_col, id) 上的多列索引应该有所帮助。根据数据分布和访问模式，附加负载列 name 以启用仅索引扫描可能会有所帮助：(other_col, id, name)。甚至部分索引，if other_col = 'x' 是常量谓词：

CREATE INDEX ON view (id) WHERE other_col = 'x';

PostgreSQL does not use a partial index

即将推出的 Postgres 9.6 甚至允许对部分索引进行仅索引扫描：

CREATE INDEX ON view (id, name) WHERE other_col = 'x';

你会喜欢这项改进（quoting the /devel manual）：

Allow using an index-only scan with a partial index when the index's predicate involves column(s) not stored in the index (Tomas Vondra, Kyotaro Horiguchi)

An index-only scan is now allowed if the query mentions such columns only in WHERE clauses that match the index predicate

使用 EXPLAIN (ANALYZE, TIMING OFF) SELECT ...
验证性能运行几次以排除缓存影响。

PostgreSQL：如何在非聚合列上 select？

PostgreSQL: How to select on non-aggregating column?

sql

postgresql

aggregate

having