Select 分区基于其他 table 中的匹配项
Select partitions based on matches in other table
具有以下 table (conversations
):
id | record_id | is_response | text |
---+------------+---------------+----------------------+
1 | 1 | false | in text 1 |
2 | 1 | true | response text 3 |
3 | 1 | false | in text 2 |
4 | 1 | true | response text 2 |
5 | 1 | true | response text 3 |
6 | 2 | false | in text 1 |
7 | 2 | true | response text 1 |
8 | 2 | false | in text 2 |
9 | 2 | true | response text 3 |
10 | 2 | true | response text 4 |
还有一个帮助table (responses
):
id | text |
---+----------------------+
1 | response text 1 |
2 | response text 2 |
3 | response text 4 |
我正在寻找一个 SQL 查询来输出以下内容:
record_id | context
----------+-----------------------+---------------------
1 | in text 1 response text 3 in text 2 response text 2
----------+-----------------------+---------------------
2 | in text 1 response text 1
----------+-----------------------+---------------------
2 | in text 2 response text 3 response text 4
所以每次 is_response
是 true
并且 text
是 在 响应 table, 聚合 到目前为止的对话上下文,忽略未以池中的响应结束的对话部分。
在上面的例子中 响应文本 3 在 record_id
1.
我已经尝试了以下复杂的方法 SQL 但它有时会因为错误地聚合文本而中断:
with context as(
with answers as (
SELECT record_id, is_response, id as ans_id
, max(id)
OVER (PARTITION BY record_id ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_ans_id
FROM (select * from conversations where text in (select text from responses)) ans
),
lines as (
select answers.record_id, con.id, COALESCE(previous_ans_id || ',' || ans_id, '0') as block, con.text as text from answers, conversations con where con.engagement_id = answers.record_id and ((previous_ans_id is null and con.id <= ans_id) OR (con.id > previous_ans_id and con.id <= ans_id)) order by engagement_id, id asc
)
select record_id, block,replace(trim(both ' ' from string_agg(text, E' ')) ,' ',' ') ctx from lines group by record_id, block order by record_id,block
)
select * from context
我确定有更好的方法。
这是我的看法:
SELECT
record_id,
string_agg(text, ' ' ORDER BY id) AS context
FROM (
SELECT
*,
coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
FROM (
SELECT *, is_response AND text IN (SELECT text FROM responses) as incl
FROM conversations
) c
) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);
这将扫描 table conversations
一次,但我不确定它的性能是否会比您的解决方案更好。基本思想是使用 window 函数来计算同一记录中前面的行结束对话的可能性。然后我们可以用那个数字和 record_id
分组并丢弃不完整的对话。
有一个简单快速的解决方法:
SELECT record_id, string_agg(text, ' ') As context
FROM (
SELECT c.*, count(r.text) OVER (PARTITION BY c.record_id ORDER BY c.id DESC) AS grp
FROM conversations c
LEFT JOIN responses r ON r.text = c.text AND c.is_response
ORDER BY record_id, id
) sub
WHERE grp > 0 -- ignore conversation part that does not end with a response
GROUP BY record_id, grp
ORDER BY record_id, grp;
count()
只计算非空值。如果 LEFT JOIN
到 responses
为空,r.text
为 NULL:
- Select rows which are not present in other table
grp
("group" 的缩写)中的值仅在触发新的输出行时增加。属于同一输出行的所有行都以相同的 grp
数字结束。然后很容易在外部聚合 SELECT
.
特殊技巧是以倒序计算对话结束。最后 end 之后的所有内容(从末尾开始时排在第一位)得到 grp = 0
并在外部 SELECT
.
中删除
类似案例更多解释:
具有以下 table (conversations
):
id | record_id | is_response | text |
---+------------+---------------+----------------------+
1 | 1 | false | in text 1 |
2 | 1 | true | response text 3 |
3 | 1 | false | in text 2 |
4 | 1 | true | response text 2 |
5 | 1 | true | response text 3 |
6 | 2 | false | in text 1 |
7 | 2 | true | response text 1 |
8 | 2 | false | in text 2 |
9 | 2 | true | response text 3 |
10 | 2 | true | response text 4 |
还有一个帮助table (responses
):
id | text |
---+----------------------+
1 | response text 1 |
2 | response text 2 |
3 | response text 4 |
我正在寻找一个 SQL 查询来输出以下内容:
record_id | context
----------+-----------------------+---------------------
1 | in text 1 response text 3 in text 2 response text 2
----------+-----------------------+---------------------
2 | in text 1 response text 1
----------+-----------------------+---------------------
2 | in text 2 response text 3 response text 4
所以每次 is_response
是 true
并且 text
是 在 响应 table, 聚合 到目前为止的对话上下文,忽略未以池中的响应结束的对话部分。
在上面的例子中 响应文本 3 在 record_id
1.
我已经尝试了以下复杂的方法 SQL 但它有时会因为错误地聚合文本而中断:
with context as(
with answers as (
SELECT record_id, is_response, id as ans_id
, max(id)
OVER (PARTITION BY record_id ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_ans_id
FROM (select * from conversations where text in (select text from responses)) ans
),
lines as (
select answers.record_id, con.id, COALESCE(previous_ans_id || ',' || ans_id, '0') as block, con.text as text from answers, conversations con where con.engagement_id = answers.record_id and ((previous_ans_id is null and con.id <= ans_id) OR (con.id > previous_ans_id and con.id <= ans_id)) order by engagement_id, id asc
)
select record_id, block,replace(trim(both ' ' from string_agg(text, E' ')) ,' ',' ') ctx from lines group by record_id, block order by record_id,block
)
select * from context
我确定有更好的方法。
这是我的看法:
SELECT
record_id,
string_agg(text, ' ' ORDER BY id) AS context
FROM (
SELECT
*,
coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
FROM (
SELECT *, is_response AND text IN (SELECT text FROM responses) as incl
FROM conversations
) c
) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);
这将扫描 table conversations
一次,但我不确定它的性能是否会比您的解决方案更好。基本思想是使用 window 函数来计算同一记录中前面的行结束对话的可能性。然后我们可以用那个数字和 record_id
分组并丢弃不完整的对话。
有一个简单快速的解决方法:
SELECT record_id, string_agg(text, ' ') As context
FROM (
SELECT c.*, count(r.text) OVER (PARTITION BY c.record_id ORDER BY c.id DESC) AS grp
FROM conversations c
LEFT JOIN responses r ON r.text = c.text AND c.is_response
ORDER BY record_id, id
) sub
WHERE grp > 0 -- ignore conversation part that does not end with a response
GROUP BY record_id, grp
ORDER BY record_id, grp;
count()
只计算非空值。如果 LEFT JOIN
到 responses
为空,r.text
为 NULL:
- Select rows which are not present in other table
grp
("group" 的缩写)中的值仅在触发新的输出行时增加。属于同一输出行的所有行都以相同的 grp
数字结束。然后很容易在外部聚合 SELECT
.
特殊技巧是以倒序计算对话结束。最后 end 之后的所有内容(从末尾开始时排在第一位)得到 grp = 0
并在外部 SELECT
.
类似案例更多解释: