在 Postgres 文本列中搜索并返回所有出现的关键字

Question

关于数据库

Confluence 页面内容的数据库 table 名为 bodycontent，HTML 内容存储在名为 body 的列中，这是一个文本字段。我正在使用 Postgres 数据库。主键名为 bodycontentid

我需要的结果

对于 table 中的每一行，我需要找到所有出现的 <image> 标记，其中 src 属性在 body 中以“http://images.mydomain.com/allImages/%”开头] 列

示例

假设 body 和 bodycontentid = 12345 包含以下文本：

<h1>Chapter 1</h1>
<image src="http://www.google.com/image/111.jpg"/>
<h1>Chapter 2</h1>
<image src="http://images.mydomain.com/allImages/222.jpg"/>
<h1>Chapter 3</h1>
<image src="http://images.mydomain.com/allImages/333.jpg"/>

运行后的结果此查询应 return:

bodycontentid: 12345 body: http://images.mydomain.com/allImages/222.jpg

bodycontentid: 12345 body: http://images.mydomain.com/allImages/333.jpg

我试过的

我能够找到至少出现一次我正在搜索的关键字的所有行（见下文），但我需要的是获取与我的查询匹配的每行所有关键字的列表。

SELECT *
FROM bodycontent
WHERE body LIKE '%http://images.mydomain.com/allImages/%'

Answer 1

一种方法是使用 regexp_split_to_table() 然后进行一些字符串操作：

select bc.bodycontentid,
       left(rst.s, position('"' in rst.s) - 1) as domain
from bodycontent bc, lateral
     regexp_split_to_table(bc.body, E'srce="') rst(s)
where rst.s like 'http://images.mydomain.com/allImages/%';

在 Postgres 文本列中搜索并返回所有出现的关键字

Searching and returning all occurences of keyword in a Postgres text column

sql

postgresql

confluence