为什么这个 crosstab() 查询 return 重复键?
Why does this crosstab() query return duplicate keys?
我有以下 table 叫 sample_events
:
Column | Type
--------+-----
title | text
date | date
具有值:
title | date
-------+------------
ev1 | 2017-01-01
ev2 | 2017-01-03
ev3 | 2017-01-02
ev4 | 2017-12-10
ev5 | 2017-12-11
ev6 | 2017-07-28
为了创建一个枢轴 table,其中包含每个唯一年份中每月的事件数,我使用了 crosstab(text source_sql, text category_sql)
:
形式的交叉表函数
SELECT * FROM crosstab (
'SELECT extract(year from date) AS year,
extract(month from date) AS month, count(*)
FROM sample_events
GROUP BY year, month'
,
'SELECT * FROM generate_series(1, 12)'
) AS (
year int, jan int, feb int, mar int,
apr int, may int, jun int, jul int,
aug int, sep int, oct int, nov int, dec int
) ORDER BY year;
结果如下,符合预期:
year | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
2017 | 3 | | | | | | 1 | | | | | 2
现在,我想创建一个数据透视表 table,其中包含一年中每个唯一星期中每周中每天的事件数。我尝试了以下查询:
SELECT * FROM crosstab (
'SELECT extract(week from date) AS week,
extract(dow from date) AS day_of_week, count(*)
FROM sample_events
GROUP BY week, day_of_week'
,
'SELECT * FROM generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
) ORDER BY week;
结果与预期不符:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | | 1 | | | |
1 | | 1 | | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
所有六个事件都在那里,但无论出于何种原因,都有重复的周值。我希望结果是这样的:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | 1 | 1 | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
问题
1) 为什么后一个查询的结果包含重复的键值,而前一个查询没有?
2) 如何创建具有 唯一 周值的枢轴 table?
crosstab()
需要有序输入。您需要在输入中添加ORDER BY
:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int AS week
, extract(dow from date)::int AS day_of_week
, count(*)::int
FROM sample_events
GROUP BY week, day_of_week
ORDER BY week, day_of_week'
, 'SELECT generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
);
或者只是 ORDER BY week
.
严格来说,同一个key的值(例子中的week
)需要分组(按顺序来)。钥匙不必订购。但实现这一点的最简单和最便宜的方法是 ORDER BY
(另外对键进行排序)。
或简称:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int
, extract(dow from date)::int
, count(*)::int
FROM sample_events
GROUP BY 1, 2
ORDER BY 1, 2' -- or just ORDER BY 1
, 'SELECT generate_series(0, 6)'
) AS ...
您的第一个带有月份的示例恰好有效,因为输入数据按顺序排列月份。但是,如果 table 中行的物理顺序发生变化(VACUUM
、UPDATE
、...),这可能会随时中断。您永远不能依赖关系 table.
中行的物理顺序
更多解释:
- PostgreSQL Crosstab Query
我有以下 table 叫 sample_events
:
Column | Type
--------+-----
title | text
date | date
具有值:
title | date
-------+------------
ev1 | 2017-01-01
ev2 | 2017-01-03
ev3 | 2017-01-02
ev4 | 2017-12-10
ev5 | 2017-12-11
ev6 | 2017-07-28
为了创建一个枢轴 table,其中包含每个唯一年份中每月的事件数,我使用了 crosstab(text source_sql, text category_sql)
:
SELECT * FROM crosstab (
'SELECT extract(year from date) AS year,
extract(month from date) AS month, count(*)
FROM sample_events
GROUP BY year, month'
,
'SELECT * FROM generate_series(1, 12)'
) AS (
year int, jan int, feb int, mar int,
apr int, may int, jun int, jul int,
aug int, sep int, oct int, nov int, dec int
) ORDER BY year;
结果如下,符合预期:
year | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
2017 | 3 | | | | | | 1 | | | | | 2
现在,我想创建一个数据透视表 table,其中包含一年中每个唯一星期中每周中每天的事件数。我尝试了以下查询:
SELECT * FROM crosstab (
'SELECT extract(week from date) AS week,
extract(dow from date) AS day_of_week, count(*)
FROM sample_events
GROUP BY week, day_of_week'
,
'SELECT * FROM generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
) ORDER BY week;
结果与预期不符:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | | 1 | | | |
1 | | 1 | | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
所有六个事件都在那里,但无论出于何种原因,都有重复的周值。我希望结果是这样的:
week | sun | mon | tue | wed | thu | fri | sat
------+-----+-----+-----+-----+-----+-----+-----
1 | | 1 | 1 | | | |
30 | | | | | | 1 |
49 | 1 | | | | | |
50 | | 1 | | | | |
52 | 1 | | | | | |
问题
1) 为什么后一个查询的结果包含重复的键值,而前一个查询没有?
2) 如何创建具有 唯一 周值的枢轴 table?
crosstab()
需要有序输入。您需要在输入中添加ORDER BY
:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int AS week
, extract(dow from date)::int AS day_of_week
, count(*)::int
FROM sample_events
GROUP BY week, day_of_week
ORDER BY week, day_of_week'
, 'SELECT generate_series(0, 6)'
) AS (
week int, sun int, mon int, tue int,
wed int, thu int, fri int, sat int
);
或者只是 ORDER BY week
.
严格来说,同一个key的值(例子中的week
)需要分组(按顺序来)。钥匙不必订购。但实现这一点的最简单和最便宜的方法是 ORDER BY
(另外对键进行排序)。
或简称:
SELECT * FROM crosstab (
'SELECT extract(week from date)::int
, extract(dow from date)::int
, count(*)::int
FROM sample_events
GROUP BY 1, 2
ORDER BY 1, 2' -- or just ORDER BY 1
, 'SELECT generate_series(0, 6)'
) AS ...
您的第一个带有月份的示例恰好有效,因为输入数据按顺序排列月份。但是,如果 table 中行的物理顺序发生变化(VACUUM
、UPDATE
、...),这可能会随时中断。您永远不能依赖关系 table.
更多解释:
- PostgreSQL Crosstab Query