Select 分区基于其他 table 中的匹配项

Question

具有以下 table (conversations):

 id | record_id  |  is_response  |         text         |
 ---+------------+---------------+----------------------+
 1  |     1      |      false    | in text 1            |
 2  |     1      |      true     | response text 3      |
 3  |     1      |      false    | in text 2            |
 4  |     1      |      true     | response text 2      |
 5  |     1      |      true     | response text 3      |
 6  |     2      |      false    | in text 1            |
 7  |     2      |      true     | response text 1      |
 8  |     2      |      false    | in text 2            |
 9  |     2      |      true     | response text 3      |
 10 |     2      |      true     | response text 4      |

还有一个帮助table (responses):

 id |         text         |
 ---+----------------------+
 1  | response text 1      |
 2  | response text 2      |
 3  | response text 4      |

我正在寻找一个 SQL 查询来输出以下内容：

  record_id |       context
  ----------+-----------------------+---------------------
       1    | in text 1 response text 3 in text 2 response text 2
  ----------+-----------------------+---------------------
       2    | in text 1 response text 1
  ----------+-----------------------+---------------------
       2    | in text 2 response text 3 response text 4

所以每次 is_response 是 true 并且 text 是在响应 table, 聚合到目前为止的对话上下文，忽略未以池中的响应结束的对话部分。

在上面的例子中 响应文本 3 在 record_id 1.

我已经尝试了以下复杂的方法 SQL 但它有时会因为错误地聚合文本而中断：

with context as(
    with answers as (

       SELECT record_id, is_response, id as ans_id
        , max(id)
          OVER (PARTITION BY record_id ORDER BY id
          ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_ans_id
       FROM (select * from conversations where text in (select text from responses)) ans
       ),
     lines as (
      select answers.record_id, con.id, COALESCE(previous_ans_id || ',' || ans_id, '0') as block, con.text as text from answers, conversations con where con.engagement_id = answers.record_id and ((previous_ans_id is null and con.id <= ans_id) OR (con.id > previous_ans_id and con.id <= ans_id)) order by engagement_id, id asc
      )

      select record_id, block,replace(trim(both ' ' from string_agg(text, E' ')) ,'  ',' ') ctx from lines group by record_id, block order by record_id,block
      )

select * from context

我确定有更好的方法。

Answer 1

这是我的看法：

SELECT
    record_id,
    string_agg(text, ' ' ORDER BY id) AS context
FROM (
    SELECT
        *,
        coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
    FROM (
        SELECT *, is_response AND text IN (SELECT text FROM responses) as incl
        FROM conversations
         ) c
     ) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);

这将扫描 table conversations 一次，但我不确定它的性能是否会比您的解决方案更好。基本思想是使用 window 函数来计算同一记录中前面的行结束对话的可能性。然后我们可以用那个数字和 record_id 分组并丢弃不完整的对话。

Answer 2

有一个简单快速的解决方法：

SELECT record_id, string_agg(text, ' ') As context
FROM  (
   SELECT c.*, count(r.text) OVER (PARTITION BY c.record_id ORDER BY c.id DESC) AS grp
   FROM   conversations  c
   LEFT   JOIN responses r ON r.text = c.text AND c.is_response
   ORDER  BY record_id, id
   ) sub
WHERE  grp > 0  -- ignore conversation part that does not end with a response
GROUP  BY record_id, grp
ORDER  BY record_id, grp;

count() 只计算非空值。如果 LEFT JOIN 到 responses 为空，r.text 为 NULL：

Select rows which are not present in other table

grp（"group" 的缩写）中的值仅在触发新的输出行时增加。属于同一输出行的所有行都以相同的 grp 数字结束。然后很容易在外部聚合 SELECT.

特殊技巧是以倒序计算对话结束。最后 end 之后的所有内容（从末尾开始时排在第一位）得到 grp = 0 并在外部 SELECT.

中删除

类似案例更多解释：

Select 分区基于其他 table 中的匹配项

Select partitions based on matches in other table

sql

postgresql

window-functions