解锁以锁步方式存储为文本的多个数组列

Question

我在 Postgres 9.6 中有以下 table：

CREATE TABLE some_tbl(
  target_id integer NOT NULL
, machine_id integer NOT NULL
, dateread timestamp without time zone NOT NULL
, state text
, ftime text
, CONSTRAINT pk_sometable PRIMARY KEY (target_id, machine_id, dateread)
 );

数据如下：

targetID	MachineID	DateRead	State	FTime
60000	30	'2021-09-29 15:20:00'	'0\|1\|0'	'850\|930\|32000'
60000	31	'2021-09-29 16:35:13'	'0\|0\|0'	'980\|1050\|30000'

重要的部分是state和ftime。我需要取消嵌套元素并保持它们的顺序。这会生成步骤。

例如，第一行将是：

targetID	MachineID	DateRead	State	FTime	Step
60000	30	'2021-09-29 15:20:00'	'0'	'850'	0
60000	30	'2021-09-29 15:20:00'	'1'	'930'	1
60000	30	'2021-09-29 15:20:00'	'0'	'32000'	2

ORDER 很重要，因为 FTIME 850 ms 始终是第一个并且在 STEP 中获得值 0，然后是第二个 930 ms 并获得第 1 步，最后 32000 ms 是第三个并获得第 2 步。

目前，我通过首先使用 string_to_array() 将文本转换为数组，然后 unnnest() 最后使用 row_number() 分配步骤编号来解决此问题。

这项工作非常棒 - 除了有时某些索引出现故障。第一行像这样：

targetID	MachineID	DateRead	State	Ftime	Step
60000	30	'2021-09-29 15:20:00'	'1'	'930'	0
60000	30	'2021-09-29 15:20:00'	'0'	'32000'	1
60000	30	'2021-09-29 15:20:00'	'0'	'850'	2

我做了上千条记录，几乎所有的都可以，但后来我必须做统计，需要得到最小值、最大值、平均值并得到错误的值，所以我检查了一下，我发现索引是错误的（我使用大量 ETL 过程移动统计数据）但是如果我执行 select 检查特定行有错误它显示完美。所以我假设 row_number 有时索引有问题，这是非常随机的。

这是我使用的 SQL:

SELECT foo.target_id,
            dateread,
            foo.machine_id,
            foo.state,
            foo.ftime::integer,
            (row_number() OVER (PARTITION BY foo.dateread, foo.machine_id, foo.target_id)) - 1 AS step
           FROM ( SELECT target_id,
                machine_id,
                dateread
                unnest(string_to_array(state, '|'::text))::integer AS state,
                unnest(string_to_array(ftime, '|'::text))::integer AS tiempo
               FROM some_table
               WHERE target_id IN (6000) AND dateread = '2021-06-09')foo

有更好的方法吗？

Answer 1

一种优雅的方法是在 LATERAL 子查询中对多个输入数组使用 unnest() 的特殊实现并附加 WITH ORDINALITY:

SELECT t.target_id, t.dateread, t.machine_id, u.state, u.tiempo
     , ord - 1 AS step
FROM   tbl t
LEFT   JOIN LATERAL unnest(string_to_array(state, '|')::int[]
                         , string_to_array(ftime, '|')::int[]) WITH ORDINALITY AS u(state, tiempo, ord) ON true
WHERE  target_id = 60000
AND    dateread = '2021-09-29 15:20:00'   -- adapted
ORDER  BY t.target_id, t.dateread, t.machine_id, step;

db<>fiddle here

由于 state 和 ftime 可以是 NULL，我使用 LEFT JOIN ... ON true 来在结果中保留这些行。

参见：

PostgreSQL unnest() with element number

当然，你真正应该做的是这样的：

与设计数据库的人解除好友关系。（我的真实建议PC版。）
安装当前的 Postgres 版本。参见：https://www.postgresql.org/support/versioning/
创建一个具有适当关系设计的新数据库。
迁移您的数据。（并保留原件的备份以确保安全。）
烧掉旧数据库，再也不提它了。

现代 Postgres 中适当的（规范化的）关系设计可能如下所示：

CREATE TABLE tbl (
  tbl_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, target_id integer NOT NULL
, machine_id integer NOT NULL
, read_timestamp timestamp with time zone NOT NULL
, CONSTRAINT tbl_uni UNIQUE (target_id, machine_id, read_timestamp)
);

CREATE TABLE tbl_step (
  tbl_id int REFERENCES tbl ON DELETE CASCADE
, step int NOT NULL
, state int NOT NULL
, tiempo int NOT NULL
, CONSTRAINT tbl_step_pkey PRIMARY KEY (tbl_id, step)
);

那么您的查询将是：

SELECT *
FROM   tbl 
LEFT   JOIN tbl_step USING (tbl_id);

解锁以锁步方式存储为文本的多个数组列

Unnest multiple array columns stored as text in lockstep

sql

postgresql

row-number

unnest

set-returning-functions