如何使用 PostgreSQL(交叉表)使类别列的值密集 table?

How to get values to dense table with columns of categories using PostgreSQL (crosstab)?

我有这个玩具示例,它给了我稀疏的 table 值,这些值在不同的类别中分开。我想要密集矩阵,其中所有列都是单独排序的。

drop table if exists temp_table;
create temp table temp_table(
    rowid int
    , category text
    , score int
    );
insert into temp_table values (0, 'cat1', 10);
insert into temp_table values (1, 'cat2', 21);
insert into temp_table values (2, 'cat3', 32);
insert into temp_table values (3, 'cat2', 23);
insert into temp_table values (4, 'cat2', 24);
insert into temp_table values (5, 'cat3', 35);
insert into temp_table values (6, 'cat1', 16);
insert into temp_table values (7, 'cat1', 17);
insert into temp_table values (8, 'cat2', 28);
insert into temp_table values (9, 'cat2', 29);

这给了这个临时 table:

rowid category score
0 cat1 10
1 cat2 21
2 cat3 32
3 cat2 23
4 cat2 24
5 cat3 35
6 cat1 16
7 cat1 17
8 cat2 28
9 cat2 29

然后根据类别将分值排序到不同的列:

select "cat1", "cat2", "cat3"
from crosstab(
    $$ select rowid, category, score from temp_table $$ -- as source_sql
    , $$ select distinct category from temp_table order by category $$ -- as category_sql
 ) as (rowid int, "cat1" int, "cat2" int, "cat3" int)
 

输出:

cat1 cat2 cat3
10
21
32
23
24
35
16
17
28
29

但我希望查询的结果是密集的,例如:

cat1 cat2 cat3
10 21 32
16 23 35
17 24
28
29

也许 PostgreSQL 的交叉表甚至不是执行此操作的正确工具,但我首先想到的是它生成的稀疏 table 接近我需要的结果。

这应该适用于确切的给定示例数据和预期输出。

select max(cat1), max(cat2), max(cat3) 
from crosstab(
$$ select rank() over(partition by category order by rowid) as ranking, 
  rowid, 
  category, 
  score 
from temp_table 
order by rowid, category asc$$ -- as source_sql
, $$ select distinct category 
from temp_table 
order by category $$ -- as category_sql
  ) as (ranking int, rowid int, "cat1" int, "cat2" int, "cat3" int) 
group by ranking 
order by ranking asc

您可以在此处测试解决方案 - https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f198e40a18a282cc0d65fa6ecdf797cb

编辑: 对您的查询进行改进以得出解决方案:

  1. 在源 SQL 查询中,我根据 rowid 顺序对类别值进行了排名,这有助于根据您的要求“确定”预期值的顺序。

select rank() over(partition by category order by rowid) as ranking, rowid, category, score from temp_table order by rowid, category asc

  1. 在外部查询中,对于源 SQL 查询中获得的每个排名,我有效地选择了每个类别的 max() 值。
with cte as (
  select category, score, row_number() over (
    partition by category order by score
  ) as r
  from temp_table
)
  select
    sum(score) filter (where category = 'cat1') as cat1,
    sum(score) filter (where category = 'cat2') as cat2,
    sum(score) filter (where category = 'cat3') as cat3
  from cte
  group by r
  order by r
;

如果列数已知且相当小,FILTER 可能比 CROSSTAB 更好,后者需要扩展。