过滤对 crosstab() 查询结果的意外影响

Question

我有一个 crosstab() 查询，如下所示：

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 = val1
  ORDER  BY row_name ASC',
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)

简化示例，实际查询非常复杂并且包含重要信息。上面的查询 returns N 结果行用 table.extra1 = val1.

过滤后

当我更改查询如下：

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 <b>IN (val1, ...)</b> --> more values
  ORDER  BY row_name ASC',
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
<b>WHERE extra1 = val1</b>; --> condition on the result

添加了更多可能的值 table.extra1 IN (val1, ...) 和最终条件 WHERE extra1 = val1。现在我得到的行比原来的少行。更糟的是，如果我向 IN (val1, ...) 添加更多值，我得到的行数更少。这是为什么？

Answer 1

extra1, extra2, ... 在交叉表术语中是 "extra columns"。
The manual for the tablefunc module解释规则：

It may also have one or more “extra” columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as “extra”. The “extra” columns are expected to be the same for all rows with the same row_name value.

再往下：

The output row_name column, plus any “extra” columns, are copied from the first row of the group.

关键部分我大胆强调

您只排序 row_name:

ORDER  BY row_name ASC

在第一个示例中没有关系，您使用以下内容进行过滤：

WHERE ... t.extra1 = 'val1'  -- single quotes by me

无论如何，所有输入行都有 extra1 = 'val1'。但在第二个示例中，您使用以下内容进行过滤很重要：

WHERE ... t.extra1 IN('val1', ...) --> More values

现在，额外的列 extra1 违反了上面第一个粗体要求。虽然第一个输入查询的排序顺序是不确定的，但 "extra" 列 extra1 的结果值是任意选择的。 extra1 的可能值越多，最终具有 'val1' 的行就越少：这就是您所观察到的。

您仍然可以让它发挥作用：要报告每个 row_name 至少有其中之一的 extra1 = 'val1'，将 ORDER BY 更改为：

ORDER  BY row_name, (extra1 <> 'val1')

'val1' 排在最前面。 boolean 表达式的解释（带有更多链接）：

其他 "extra" 列仍然是任意选择的，而排序顺序不是确定的。

交叉表基础知识：

PostgreSQL Crosstab Query

过滤对 crosstab() 查询结果的意外影响

Unexpected effect of filtering on result from crosstab() query

sql

postgresql

distinct

crosstab

sql-order-by