如何合并记录更改历史记录的 table 的行,以便获得某一行在特定时刻的样子?
How can I merge rows of a table that records history of changes in order to obtain what a row looked like at a specific moment?
我有以下 2 SELECTs:
SELECT * FROM public.app_user WHERE id = 'e31b55bf';
+--------+----+-----------+-----+-----+--------------------------+
|id |name|email |role |bio |created_at |
+--------+----+-----------+-----+-----+--------------------------+
|e31b55bf|Jon |jon@app.com|admin|Hello|2022-01-01 00:00:00.000000|
+--------+----+-----------+-----+-----+--------------------------+
SELECT * FROM history.app_user WHERE id = 'e31b55bf';
+--------+----+--------------+--------+----+--------------------------+
|id |name|email |role |bio |updated_at |
+--------+----+--------------+--------+----+--------------------------+
|e31b55bf|ASDF|test |NULL |NULL|2022-01-02 00:00:00.000000|
|e31b55bf|Test|test@gmail.com|basic |NULL|2022-01-03 00:00:00.000000|
|e31b55bf|NULL|NULL |standard|asdf|2022-01-04 00:00:00.000000|
|e31b55bf|NULL|NULL |mod |NULL|2022-01-05 00:00:00.000000|
+--------+----+--------------+--------+----+--------------------------+
public.app_user
包含我的应用程序的用户,history.app_user
包含第一行的先前值的记录。在上面的示例中,用户 e31b55bf
在 1 月 5 日之前是 mod 而不是管理员,在 4 日之前是一个标准用户,个人简介为“asdf”,一个名为“Test”的基本用户3 号之前的电子邮件“test@gmail.com”...
我想提出一个 SELECT 或一个函数来告诉我该行在特定时间点的样子。我相信我已经完成了它,但我的解决方案看起来比它应该的更复杂。将其转换为其他表也很乏味:例如 public.project
和 history.project
,其中的列完全不同。我相信存在更清洁、更易于读写的解决方案。 SQL 向导可以帮我吗?
我当前的解决方案包括用以下非空值覆盖 public.app_user
当前行的值:
SELECT t.uuid,
t1.name,
t2.email,
t3.role,
t4.bio
FROM (
SELECT uuid,
MIN(CASE WHEN name IS NOT NULL THEN updated_at END) AS name_date,
MIN(CASE WHEN email IS NOT NULL THEN updated_at END) AS email_date,
MIN(CASE WHEN role IS NOT NULL THEN updated_at END) AS role_date,
MIN(CASE WHEN bio IS NOT NULL THEN updated_at END) AS bio_date
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
GROUP BY uuid
) t
LEFT JOIN history.app_user t1 ON t1.updated_at = t.name_date
LEFT JOIN history.app_user t2 ON t2.updated_at = t.email_date
LEFT JOIN history.app_user t3 ON t3.updated_at = t.role_date
LEFT JOIN history.app_user t4 ON t4.updated_at = t.bio_date
解决方案 1:window 函数
SELECT DISTINCT ON (uuid)
, uuid
, (array_agg(name) FILTER (WHERE name IS NOT NULL) OVER w)[1] AS name
, (array_agg(email) FILTER (WHERE email IS NOT NULL) OVER w)[1] AS email
, (array_agg(role) FILTER (WHERE role IS NOT NULL) OVER w)[1] AS role
, (array_agg(bio) FILTER (WHERE bio IS NOT NULL) OVER w)[1] AS bio
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
WINDOW w AS (PARTITION BY uuid ORDER BY updated_at)
array_agg()
在这里是 aggregate function which is used as a window function。
FILTER (WHERE condition)
用于从所选行中排除 NULL 值。
window 是与 manual 中描述的当前行相关联的行的子集,即具有相同 uuid
的所有现有行,如 PARTITION BY
子句。 ORDER BY
子句允许将最早的非空值放在结果数组的第一个位置,并由 [1]
.
选择
这里使用 window 函数的主要问题是我们得到的行与 WHERE
子句过滤的行一样多。 DISTINCT ON ()
子句从最终结果中排除多余的行。
第二个解决方案基于与 aggregate function
而不是 window function
相同的函数,在我们的例子中更准确。
解决方案 2:聚合函数
SELECT uuid
, (array_agg(name ORDER BY updated_at) FILTER (WHERE name IS NOT NULL))[1] AS name
, (array_agg(email ORDER BY updated_at) FILTER (WHERE email IS NOT NULL))[1] AS email
, (array_agg(role ORDER BY updated_at) FILTER (WHERE role IS NOT NULL))[1] AS role
, (array_agg(bio ORDER BY updated_at) FILTER (WHERE bio IS NOT NULL))[1] AS bio
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
GROUP BY uuid
我有以下 2 SELECTs:
SELECT * FROM public.app_user WHERE id = 'e31b55bf';
+--------+----+-----------+-----+-----+--------------------------+
|id |name|email |role |bio |created_at |
+--------+----+-----------+-----+-----+--------------------------+
|e31b55bf|Jon |jon@app.com|admin|Hello|2022-01-01 00:00:00.000000|
+--------+----+-----------+-----+-----+--------------------------+
SELECT * FROM history.app_user WHERE id = 'e31b55bf';
+--------+----+--------------+--------+----+--------------------------+
|id |name|email |role |bio |updated_at |
+--------+----+--------------+--------+----+--------------------------+
|e31b55bf|ASDF|test |NULL |NULL|2022-01-02 00:00:00.000000|
|e31b55bf|Test|test@gmail.com|basic |NULL|2022-01-03 00:00:00.000000|
|e31b55bf|NULL|NULL |standard|asdf|2022-01-04 00:00:00.000000|
|e31b55bf|NULL|NULL |mod |NULL|2022-01-05 00:00:00.000000|
+--------+----+--------------+--------+----+--------------------------+
public.app_user
包含我的应用程序的用户,history.app_user
包含第一行的先前值的记录。在上面的示例中,用户 e31b55bf
在 1 月 5 日之前是 mod 而不是管理员,在 4 日之前是一个标准用户,个人简介为“asdf”,一个名为“Test”的基本用户3 号之前的电子邮件“test@gmail.com”...
我想提出一个 SELECT 或一个函数来告诉我该行在特定时间点的样子。我相信我已经完成了它,但我的解决方案看起来比它应该的更复杂。将其转换为其他表也很乏味:例如 public.project
和 history.project
,其中的列完全不同。我相信存在更清洁、更易于读写的解决方案。 SQL 向导可以帮我吗?
我当前的解决方案包括用以下非空值覆盖 public.app_user
当前行的值:
SELECT t.uuid,
t1.name,
t2.email,
t3.role,
t4.bio
FROM (
SELECT uuid,
MIN(CASE WHEN name IS NOT NULL THEN updated_at END) AS name_date,
MIN(CASE WHEN email IS NOT NULL THEN updated_at END) AS email_date,
MIN(CASE WHEN role IS NOT NULL THEN updated_at END) AS role_date,
MIN(CASE WHEN bio IS NOT NULL THEN updated_at END) AS bio_date
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
GROUP BY uuid
) t
LEFT JOIN history.app_user t1 ON t1.updated_at = t.name_date
LEFT JOIN history.app_user t2 ON t2.updated_at = t.email_date
LEFT JOIN history.app_user t3 ON t3.updated_at = t.role_date
LEFT JOIN history.app_user t4 ON t4.updated_at = t.bio_date
解决方案 1:window 函数
SELECT DISTINCT ON (uuid)
, uuid
, (array_agg(name) FILTER (WHERE name IS NOT NULL) OVER w)[1] AS name
, (array_agg(email) FILTER (WHERE email IS NOT NULL) OVER w)[1] AS email
, (array_agg(role) FILTER (WHERE role IS NOT NULL) OVER w)[1] AS role
, (array_agg(bio) FILTER (WHERE bio IS NOT NULL) OVER w)[1] AS bio
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
WINDOW w AS (PARTITION BY uuid ORDER BY updated_at)
array_agg()
在这里是 aggregate function which is used as a window function。
FILTER (WHERE condition)
用于从所选行中排除 NULL 值。
window 是与 manual 中描述的当前行相关联的行的子集,即具有相同 uuid
的所有现有行,如 PARTITION BY
子句。 ORDER BY
子句允许将最早的非空值放在结果数组的第一个位置,并由 [1]
.
这里使用 window 函数的主要问题是我们得到的行与 WHERE
子句过滤的行一样多。 DISTINCT ON ()
子句从最终结果中排除多余的行。
第二个解决方案基于与 aggregate function
而不是 window function
相同的函数,在我们的例子中更准确。
解决方案 2:聚合函数
SELECT uuid
, (array_agg(name ORDER BY updated_at) FILTER (WHERE name IS NOT NULL))[1] AS name
, (array_agg(email ORDER BY updated_at) FILTER (WHERE email IS NOT NULL))[1] AS email
, (array_agg(role ORDER BY updated_at) FILTER (WHERE role IS NOT NULL))[1] AS role
, (array_agg(bio ORDER BY updated_at) FILTER (WHERE bio IS NOT NULL))[1] AS bio
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
GROUP BY uuid