如何合并记录更改历史记录的 table 的行,以便获得某一行在特定时刻的样子?

How can I merge rows of a table that records history of changes in order to obtain what a row looked like at a specific moment?

我有以下 2 SELECTs:

SELECT * FROM public.app_user WHERE id = 'e31b55bf';
+--------+----+-----------+-----+-----+--------------------------+
|id      |name|email      |role |bio  |created_at                |
+--------+----+-----------+-----+-----+--------------------------+
|e31b55bf|Jon |jon@app.com|admin|Hello|2022-01-01 00:00:00.000000|
+--------+----+-----------+-----+-----+--------------------------+

SELECT * FROM history.app_user WHERE id = 'e31b55bf';
+--------+----+--------------+--------+----+--------------------------+
|id      |name|email         |role    |bio |updated_at                |
+--------+----+--------------+--------+----+--------------------------+
|e31b55bf|ASDF|test          |NULL    |NULL|2022-01-02 00:00:00.000000|
|e31b55bf|Test|test@gmail.com|basic   |NULL|2022-01-03 00:00:00.000000|
|e31b55bf|NULL|NULL          |standard|asdf|2022-01-04 00:00:00.000000|
|e31b55bf|NULL|NULL          |mod     |NULL|2022-01-05 00:00:00.000000|
+--------+----+--------------+--------+----+--------------------------+

public.app_user 包含我的应用程序的用户,history.app_user 包含第一行的先前值的记录。在上面的示例中,用户 e31b55bf 在 1 月 5 日之前是 mod 而不是管理员,在 4 日之前是一个标准用户,个人简介为“asdf”,一个名为“Test”的基本用户3 号之前的电子邮件“test@gmail.com”...

我想提出一个 SELECT 或一个函数来告诉我该行在特定时间点的样子。我相信我已经完成了它,但我的解决方案看起来比它应该的更复杂。将其转换为其他表也很乏味:例如 public.projecthistory.project,其中的列完全不同。我相信存在更清洁、更易于读写的解决方案。 SQL 向导可以帮我吗?

我当前的解决方案包括用以下非空值覆盖 public.app_user 当前行的值:

SELECT t.uuid,
       t1.name,
       t2.email,
       t3.role,
       t4.bio
FROM (
     SELECT uuid,
            MIN(CASE WHEN name IS NOT NULL THEN updated_at END)       AS name_date,
            MIN(CASE WHEN email IS NOT NULL THEN updated_at END)      AS email_date,
            MIN(CASE WHEN role IS NOT NULL THEN updated_at END)       AS role_date,
            MIN(CASE WHEN bio IS NOT NULL THEN updated_at END)        AS bio_date
     FROM history.app_user
     WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
     GROUP BY uuid
     ) t
     LEFT JOIN history.app_user t1 ON t1.updated_at = t.name_date
     LEFT JOIN history.app_user t2 ON t2.updated_at = t.email_date
     LEFT JOIN history.app_user t3 ON t3.updated_at = t.role_date
     LEFT JOIN history.app_user t4 ON t4.updated_at = t.bio_date

解决方案 1:window 函数

SELECT DISTINCT ON (uuid)
     , uuid
     , (array_agg(name) FILTER (WHERE name IS NOT NULL) OVER w)[1] AS name
     , (array_agg(email) FILTER (WHERE email IS NOT NULL) OVER w)[1] AS email
     , (array_agg(role) FILTER (WHERE role IS NOT NULL) OVER w)[1] AS role
     , (array_agg(bio) FILTER (WHERE bio IS NOT NULL) OVER w)[1] AS bio 
  FROM history.app_user
 WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
WINDOW w AS (PARTITION BY uuid ORDER BY updated_at)

array_agg() 在这里是 aggregate function which is used as a window function

FILTER (WHERE condition) 用于从所选行中排除 NULL 值。

window 是与 manual 中描述的当前行相关联的行的子集,即具有相同 uuid 的所有现有行,如 PARTITION BY 子句。 ORDER BY 子句允许将最早的非空值放在结果数组的第一个位置,并由 [1].

选择

这里使用 window 函数的主要问题是我们得到的行与 WHERE 子句过滤的行一样多。 DISTINCT ON () 子句从最终结果中排除多余的行。

第二个解决方案基于与 aggregate function 而不是 window function 相同的函数,在我们的例子中更准确。

解决方案 2:聚合函数

SELECT uuid
     , (array_agg(name ORDER BY updated_at) FILTER (WHERE name IS NOT NULL))[1] AS name
     , (array_agg(email ORDER BY updated_at) FILTER (WHERE email IS NOT NULL))[1] AS email
     , (array_agg(role ORDER BY updated_at) FILTER (WHERE role IS NOT NULL))[1] AS role
     , (array_agg(bio ORDER BY updated_at) FILTER (WHERE bio IS NOT NULL))[1] AS bio 
  FROM history.app_user
 WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
 GROUP BY uuid