CUBE + 外连接 = 额外的 NULL 行

CUBE + outer join = extra NULL row

当我在带有 OUTER JOIN 的查询上使用 PostgreSQL 的 CUBE 时,我得到一个额外的全 NULL 行,无法与多维数据集自己的 "everything combined" 全 NULL 结果区分开来。

CREATE TABLE species
  ( id    SERIAL PRIMARY KEY,
    name  TEXT);

CREATE TABLE pet
  ( species_id INTEGER REFERENCES species(id),
    is_adult   BOOLEAN, 
    number     INTEGER)
;

INSERT INTO species VALUES
  (1, 'cat'), (2, 'dog');

INSERT INTO pet VALUES
  (1, true, 3), (1, false, 1), (2, true, 1), (null, true, 2);

好的,总共有 7 只宠物:

SELECT SUM(number) FROM pet;
 sum
-----
   7
(1 row)

现在看立方体的总行数:

SELECT * FROM (
        SELECT name, is_adult, SUM(number)
        FROM   pet p
        JOIN   species s ON (p.species_id = s.id)
        GROUP BY CUBE (name, is_adult)) subq
WHERE name IS NULL
AND   is_adult IS NULL;

 name | is_adult | sum
------+----------+-----
      |          |   5
(1 row)

5只宠物?哦,对了,因为不包括非物种宠物。我需要一个外连接。

SELECT * FROM (
        SELECT name, is_adult, SUM(number)
        FROM   pet p
        LEFT OUTER JOIN   species s ON (p.species_id = s.id)
        GROUP BY CUBE (name, is_adult)) subq
WHERE name IS NULL
AND   is_adult IS NULL;

 name | is_adult | sum 
------+----------+-----
      |          |   2
      |          |   7
(2 rows)

我的多维数据集有 2 个全空行;第二个是我想要的答案。

我不太明白这里发生了什么:NULL 值用于表示两种不同的事物("the cube has rolled up all this column's values" 或 "this row has no children in the right-side table")。我只是不知道如何解决它。

NULL values are used to signal two different things ("the cube has rolled up all this column's values" or "this row has no children in the right-side table").

为了区分一个null和另一个null,可以使用grouping(...)函数,参见table 9-55这里:https://www.postgresql.org/docs/9.6/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE

GROUPING(args...) Integer bit mask indicating which arguments are not being included in the current grouping set

Grouping operations are used in conjunction with grouping sets (see Section 7.2.4) to distinguish result rows. The arguments to the GROUPING operation are not actually evaluated, but they must match exactly expressions given in the GROUP BY clause of the associated query level. Bits are assigned with the rightmost argument being the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the result row, and 1 if it is not.


 name | is_adult | sum 
------+----------+-----
      |          |   2
      |          |   7

the second one is the answer I wanted.

试试这个:

SELECT name, is_adult, SUM(number)
FROM   pet p
LEFT OUTER JOIN   species s ON (p.species_id = s.id)
GROUP BY CUBE (name, is_adult)
HAVING grouping(name,is_adult) = 3

name |is_adult |sum  |
-----|---------|-----|
     |         |7    |

另请检查此查询以了解 grouping 函数的工作原理:

SELECT name, is_adult, SUM(number), grouping(name,is_adult)
FROM   pet p
LEFT OUTER JOIN   species s ON (p.species_id = s.id)
GROUP BY CUBE (name, is_adult)

name |is_adult |sum |grouping |
-----|---------|----|---------|
cat  |false    |1   |0        |
cat  |true     |3   |0        |
cat  |         |4   |1        |
dog  |true     |1   |0        |
dog  |         |1   |1        |
     |true     |2   |0        |
     |         |2   |1        |
     |         |7   |3        |
     |false    |1   |2        |
     |true     |6   |2        |