在 postgres 中使用 json_agg 查询性能

Question

我有两张桌子

Table 条数


 id    | val1 | val2
-------+------+------+
 ent_1 | xxx  | xxx
 ent_2 | xyy  | yyy
 ent_3 | zxz  | zzz
 ent_4 | zxz  | zzz

Table entries_list


 id  | entry_id | val1 | val2
-----+----------+------+-------
   1 |   ent_1  | xxx  | xxx
   2 |   ent_1  | xyy  | yyy
   3 |   ent_2  | zxz  | zzz
   4 |   ent_2  | zxz  | zzz

entries_list.entry_id 是来自 entries.id

的 forigen 密钥

所以我需要找到具有相应 entry_list 引用的 entries。我不想要在 entry_list 中没有任何引用的 entries，我期望的结果是

[{
    id: ent_1,
    entries: [{
        id: 1,
        val1: xxx,
        val2: xxx
    }, {
        id: 1,
        val1: xxx,
        val2: xxx
    }]
}, {
    id: ent_2,
    entries: [{
        id: 3,
        val1: xxx,
        val2: xxx
    }, {
        id: 4,
        val1: xxx,
        val2: xxx
    }]
}]

因为想要的结果和结构，我决定使用 Json_agg 和 Json_build_object 查询看起来像这样

SELECT entries.id, 
       Json_agg(Json_build_object('id', list.id, 'val1', list.val2, 'val2', 
       list.val2)) AS sub_list 
FROM   entries 
       INNER JOIN (SELECT id,val1,val2 
                   FROM   entries_list) AS list 
               ON entries.id = list.entry_id
GROUP  BY entries.id 
ORDER  BY entries.id

但它的表现非常糟糕，100 万条记录几乎需要 10 秒。那么改变这种情况的更好方法是什么？

我想以计划方式获取数据并在 sql 之外的代码中对其进行分组，但是应该如何在这两种方法中修改查询。？

我有 nodejs 后端和 pg 模块作为连接器。

Answer 1

这个版本表现如何？

SELECT e.id, 
       (SELECT Json_agg(Json_build_object('id', el.id, 'val1', el.val2, 'val2', 
       el.val2))
        FROM entries_list el 
        WHERE el.entry_id = e.id
       ) as sub_list 
FROM entries e 
ORDER BY e.id ;

为了性能，您需要 entries_list(entry_id, id, val2) 上的索引。第一个键尤为重要。

Answer 2

您可以使用 exists 而不是常规连接：

select 
    entry_id, 
    json_agg(json_build_object('id', id, 'val1', val2, 'val2', val2)) as sub_list 
from entries_list
where exists (
    select 1 
    from entries e 
    where entry_id = e.id)
group by entry_id 
order by entry_id;

您需要 entries_list(entry_id) 和 entries(id) 上的索引（显然，它可能是主键）。

在 postgres 中使用 json_agg 查询性能

Query performance on join with json_agg in postgres

sql

postgresql

performance

query-performance

postgresql-performance