根据条件聚合文本
Aggregate text based on a criteria
在 上,我问了一个类似的问题,该问题依赖于助手 table 作为拆分数据的标准的一部分。看来我现在的目标比较简单,但是想不出来
鉴于 table:
CREATE TABLE conversations (id int, record_id int, is_response bool, text text);
INSERT INTO conversations VALUES
(1, 1, false, 'in text 1')
, (2, 1, true , 'response text 1')
, (3, 1, false, 'in text 2')
, (4, 1, true , 'response text 2')
, (5, 1, true , 'response text 3')
, (6, 2, false, 'in text 1')
, (7, 2, true , 'response text 1')
, (8, 2, false, 'in text 2')
, (9, 2, true , 'response text 2')
, (10, 2, true , 'response text 3');
我想根据 is_response
值聚合文本并输出以下内容:
record_id | aggregated_text |
----------+---------------------------------------------------+
1 |in text 1 response text 1 |
----------+---------------------------------------------------+
1 |in text 2 response text 2 response text 3 |
----------+---------------------------------------------------+
2 |in text 1 response text 1 |
----------+---------------------------------------------------+
2 |in text 2 response text 2 response text 3 |
我尝试了以下查询,但它无法连续聚合两个响应,IE :is_response 在一个序列中为真。
SELECT
record_id,
string_agg(text, ' ' ORDER BY id) AS aggregated_text
FROM (
SELECT
*,
coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
FROM (
SELECT *, is_response as incl
FROM conversations
) c
) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);
我的查询输出只是为以下 is_response 行添加了另一行,如下所示:
record_id | aggregated_text |
----------+---------------------------------------------------+
1 |in text 1 response text 1 |
----------+---------------------------------------------------+
1 |in text 2 response text 2 |
----------+---------------------------------------------------+
1 |response text 3 |
----------+---------------------------------------------------+
2 |in text 1 response text 1 |
----------+---------------------------------------------------+
2 |in text 2 response text 2 |
----------+---------------------------------------------------+
2 | response text 3 |
----------+---------------------------------------------------+
我该如何解决?
这基本上是 的简单版本。
SELECT record_id, string_agg(text, ' ') As context
FROM (
SELECT *, count(NOT is_response OR NULL) OVER (PARTITION BY record_id ORDER BY id) AS grp
FROM conversations
ORDER BY record_id, id
) sub
GROUP BY record_id, grp
ORDER BY record_id, grp;
在子查询中使用单个 window 函数,然后聚合。
完全生成所需的结果。
我对你最后一个问题的回答中有详细的解释和链接:
这是 I gave in your 的变体:
SELECT record_id, string_agg(text, ' ')
FROM (
SELECT *, coalesce(sum(incl::integer) OVER w,0) AS subgrp
FROM (
SELECT *, is_response AND NOT coalesce(lead(is_response) OVER w,false) AS incl
FROM conversations
WINDOW w AS (PARTITION BY record_id ORDER BY id)
) t
WINDOW w AS (PARTITION BY record_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) t1
GROUP BY record_id, subgrp
HAVING bool_or(incl)
ORDER BY min(id);
我们的想法是,对于每一行,我们在 lead
window 函数的帮助下查看同一记录的下一行。如果没有这样的行,或者如果有一个并且它的 is_response
为假而当前 is_response
为真,那么我们 select 该行,聚合所有以前未使用的值 text
.
此查询还确保如果最后一次对话不完整(在您的示例数据中不会发生),它将被忽略。
在
鉴于 table:
CREATE TABLE conversations (id int, record_id int, is_response bool, text text);
INSERT INTO conversations VALUES
(1, 1, false, 'in text 1')
, (2, 1, true , 'response text 1')
, (3, 1, false, 'in text 2')
, (4, 1, true , 'response text 2')
, (5, 1, true , 'response text 3')
, (6, 2, false, 'in text 1')
, (7, 2, true , 'response text 1')
, (8, 2, false, 'in text 2')
, (9, 2, true , 'response text 2')
, (10, 2, true , 'response text 3');
我想根据 is_response
值聚合文本并输出以下内容:
record_id | aggregated_text |
----------+---------------------------------------------------+
1 |in text 1 response text 1 |
----------+---------------------------------------------------+
1 |in text 2 response text 2 response text 3 |
----------+---------------------------------------------------+
2 |in text 1 response text 1 |
----------+---------------------------------------------------+
2 |in text 2 response text 2 response text 3 |
我尝试了以下查询,但它无法连续聚合两个响应,IE :is_response 在一个序列中为真。
SELECT
record_id,
string_agg(text, ' ' ORDER BY id) AS aggregated_text
FROM (
SELECT
*,
coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
FROM (
SELECT *, is_response as incl
FROM conversations
) c
) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);
我的查询输出只是为以下 is_response 行添加了另一行,如下所示:
record_id | aggregated_text |
----------+---------------------------------------------------+
1 |in text 1 response text 1 |
----------+---------------------------------------------------+
1 |in text 2 response text 2 |
----------+---------------------------------------------------+
1 |response text 3 |
----------+---------------------------------------------------+
2 |in text 1 response text 1 |
----------+---------------------------------------------------+
2 |in text 2 response text 2 |
----------+---------------------------------------------------+
2 | response text 3 |
----------+---------------------------------------------------+
我该如何解决?
这基本上是
SELECT record_id, string_agg(text, ' ') As context
FROM (
SELECT *, count(NOT is_response OR NULL) OVER (PARTITION BY record_id ORDER BY id) AS grp
FROM conversations
ORDER BY record_id, id
) sub
GROUP BY record_id, grp
ORDER BY record_id, grp;
在子查询中使用单个 window 函数,然后聚合。
完全生成所需的结果。我对你最后一个问题的回答中有详细的解释和链接:
这是
SELECT record_id, string_agg(text, ' ')
FROM (
SELECT *, coalesce(sum(incl::integer) OVER w,0) AS subgrp
FROM (
SELECT *, is_response AND NOT coalesce(lead(is_response) OVER w,false) AS incl
FROM conversations
WINDOW w AS (PARTITION BY record_id ORDER BY id)
) t
WINDOW w AS (PARTITION BY record_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) t1
GROUP BY record_id, subgrp
HAVING bool_or(incl)
ORDER BY min(id);
我们的想法是,对于每一行,我们在 lead
window 函数的帮助下查看同一记录的下一行。如果没有这样的行,或者如果有一个并且它的 is_response
为假而当前 is_response
为真,那么我们 select 该行,聚合所有以前未使用的值 text
.
此查询还确保如果最后一次对话不完整(在您的示例数据中不会发生),它将被忽略。