Postgres 聚合嵌套的 jsonb 数组值

Postgres aggregate nested jsonb array values

在 Postgres 11.x 中,我试图将具有数组字段的嵌套 jsonb 对象中的元素聚合到每个 device_id 的一行中。这是名为 configurations.

的 table 的示例数据
id device_id data
1 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}], ""other_data"": {}}"
2 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}, {""other_data"": {}, ""sensor_type"": 2}], ""other_data"": {}}"
3 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 3}], ""other_data"": {}}"
4 2 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 4}], ""other_data"": {}}"
5 2 "{""sensors"": null, ""other_data"": {}}"
6 3 "{""sensors"": [], ""other_data"": {}}"

我的目标输出是每个 device_id 一行,其中包含一组不同的 sensor_types,示例:

device_id sensor_types
1 [1,2,3]
2 [4]
3 [ ] null would also be fine here

尝试了很多东西但是 运行 遇到了各种问题,这里有一些 SQL 来设置测试环境:

CREATE TEMPORARY TABLE configurations(
   id SERIAL PRIMARY KEY,
   device_id SERIAL,
   data JSONB
);

INSERT INTO configurations(device_id, data) VALUES
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} } ] }'),
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} }, { "sensor_type": 2, "other_data": {} }] }'),
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 3, "other_data": {} }] }'),
    (2, '{ "other_data": {}, "sensors": [ { "sensor_type": 4, "other_data": {} }] }'),
    (2, '{ "other_data": {}, "sensors": null }'),
    (3, '{ "other_data": {}, "sensors": [] }');

快速说明,我的真实 table 大约有 100,000 行,jsonb 数据要复杂得多,但遵循这个一般结构。

JSONB null 在 Postgres 中引起了一些问题,应该尽可能避免。您可以使用表达式

将值转换为空数组
coalesce(nullif(data->'sensors', 'null'), '[]')

第一次尝试:

select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
group by device_id;

 device_id | sensor_types
-----------+--------------
         1 | {1,2,3}
         2 | {4,NULL}
         3 | {NULL}
(3 rows)

可能因为结果nulls不尽如人意。尝试删除它们时

select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id;

 device_id | sensor_types
-----------+--------------
         1 | {1,2,3}
         2 | {4}
(2 rows)

device_id = 3 消失。好吧,我们可以从 table:

中得到所有的 device_ids
select distinct device_id, sensor_types
from configurations
left join (
    select device_id, array_agg(distinct value->'sensor_type') as sensor_types
    from configurations
    left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
    where value is not null
    group by device_id
    ) s
using(device_id);

 device_id | sensor_types
-----------+--------------
         1 | {1,2,3}
         2 | {4}
         3 |
(3 rows)