根据条件聚合文本

Aggregate text based on a criteria

上,我问了一个类似的问题,该问题依赖于助手 table 作为拆分数据的标准的一部分。看来我现在的目标比较简单,但是想不出来

鉴于 table:

CREATE TABLE conversations (id int, record_id int, is_response bool, text text);
INSERT INTO conversations VALUES
  (1,  1,  false, 'in text 1')
, (2,  1,  true , 'response text 1')
, (3,  1,  false, 'in text 2')
, (4,  1,  true , 'response text 2')
, (5,  1,  true , 'response text 3')
, (6,  2,  false, 'in text 1')
, (7,  2,  true , 'response text 1')
, (8,  2,  false, 'in text 2')
, (9,  2,  true , 'response text 2')
, (10, 2,  true , 'response text 3');

我想根据 is_response 值聚合文本并输出以下内容:

 record_id | aggregated_text                                   |
 ----------+---------------------------------------------------+
 1         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 1         |in text 2 response text 2 response text 3          |
 ----------+---------------------------------------------------+
 2         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 2         |in text 2 response text 2 response text 3          |

我尝试了以下查询,但它无法连续聚合两个响应,IE :is_response 在一个序列中为真。

SELECT
    record_id,
    string_agg(text, ' ' ORDER BY id) AS aggregated_text
FROM (
    SELECT
        *,
        coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
    FROM (
        SELECT *, is_response as incl
        FROM conversations
         ) c
     ) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);

我的查询输出只是为以下 is_response 行添加了另一行,如下所示:

 record_id | aggregated_text                                   |
 ----------+---------------------------------------------------+
 1         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 1         |in text 2 response text 2                          |
 ----------+---------------------------------------------------+
 1         |response text 3                                    |
 ----------+---------------------------------------------------+
 2         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 2         |in text 2 response text 2                          |
 ----------+---------------------------------------------------+
 2         | response text 3                                   |
 ----------+---------------------------------------------------+

我该如何解决?

这基本上是 的简单版本。

SELECT record_id, string_agg(text, ' ') As context
FROM  (
   SELECT *, count(NOT is_response OR NULL) OVER (PARTITION BY record_id ORDER BY id) AS grp
   FROM   conversations
   ORDER  BY record_id, id
   ) sub
GROUP  BY record_id, grp
ORDER  BY record_id, grp;

在子查询中使用单个 window 函数,然后聚合。

完全生成所需的结果。

我对你最后一个问题的回答中有详细的解释和链接:

这是 I gave in your 的变体:

SELECT record_id, string_agg(text, ' ')
FROM (
    SELECT *, coalesce(sum(incl::integer) OVER w,0) AS subgrp
    FROM (
        SELECT *, is_response AND NOT coalesce(lead(is_response) OVER w,false) AS incl
        FROM conversations
        WINDOW w AS (PARTITION BY record_id ORDER BY id)
    ) t
    WINDOW w AS (PARTITION BY record_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) t1
GROUP BY record_id, subgrp
HAVING bool_or(incl)
ORDER BY min(id);

我们的想法是,对于每一行,我们在 lead window 函数的帮助下查看同一记录的下一行。如果没有这样的行,或者如果有一个并且它的 is_response 为假而当前 is_response 为真,那么我们 select 该行,聚合所有以前未使用的值 text.

此查询还确保如果最后一次对话不完整(在您的示例数据中不会发生),它将被忽略。