替换 0 个或多个 html 个标签内的文本
Replace the text within 0 or more html tags
我有一个 table,其中有一列包含 varchar2 字符串,如下所示。
My dog chases my cat
<p>My dog ate my other cat</p>
<p><div id="abcd">My cat ate my hamster</div><p>
<p><b><div id="abcd">My hamster sleeps all the time</div></b></p>
我需要对 html 标签内的文本进行 SUBSTR。
我认为编号组是可行的方法,但我无法将结束标记放入自己的组中。这是 sql 我有:
WITH sentences AS
(
SELECT 1 as nr, 'My dog chases my cat' AS ln FROM DUAL
UNION
SELECT 2, '<p>My dog ate my other cat</p>' FROM DUAL
UNION
SELECT 3,'<p><x><div id="abcd">My cat ate my hamster</div></x></p>' FROM DUAL
UNION
SELECT 4,'<p><b><div id="abcd">My hamster sleeps all the time</div></b></p>' FROM DUAL
)
SELECT nr, regexp_replace(ln, '^((<[^>]+>)+)(.*)((<[^>]+>)+)$', 'group 1:,group 2:,group 3:,group 4:', 1, 1, 'n')
FROM sentences order by nr;
RETURNS
1 My dog chases my cat
2 group 1:<p>,group 2:<p>,group 3:My dog ate my other cat,group 4:</p>
3 group 1:<p><x><div id="abcd">,group 2:<div id="abcd">,group 3:My cat ate my hamster</div></x>,group 4:</p>
4 group 1:<p><b><div id="abcd">,group 2:<div id="abcd">,group 3:My hamster sleeps all the time</div></b>,group 4:</p>
第 4 组仅包含一个结束标签,其他结束标签 is/are 在第 3 组中,如第 3 行和第 4 行所示。我需要什么模式才能让所有结束标签都有自己的编号组 ?
虽然说过不要这样做,但对于这些特定值,您实际上只有一个字符:
WITH sentences AS
(
SELECT 1 as nr, 'My dog chases my cat' AS ln FROM DUAL
UNION
SELECT 2, '<p>My dog ate my other cat</p>' FROM DUAL
UNION
SELECT 3,'<p><x><div id="abcd">My cat ate my hamster</div></x></p>' FROM DUAL
UNION
SELECT 4,'<p><b><div id="abcd">My hamster sleeps all the time</div></b></p>' FROM DUAL
)
SELECT nr,
regexp_replace(ln, '^((<[^>]+>)+)(.*?)((<[^>]+>)+)$', 'group 1:,group 2:,group 3:,group 4:', 1, 1, 'n') as str
--------------------------------------^
FROM sentences order by nr;
没有 ?
使 .*
非贪婪,您将在第三组中包括较早的结束标签,并且只有最后的结束标签进入第 4 组,因为它有到.
NR STR
-- ------------------------------------------------------------------------------------------------------------------------
1 My dog chases my cat
2 group 1:<p>,group 2:<p>,group 3:My dog ate my other cat,group 4:</p>
3 group 1:<p><x><div id="abcd">,group 2:<div id="abcd">,group 3:My cat ate my hamster,group 4:</div></x></p>
4 group 1:<p><b><div id="abcd">,group 2:<div id="abcd">,group 3:My hamster sleeps all the time,group 4:</div></b></p>
或者只是那个组:
SELECT nr, regexp_replace(ln, '^((<[^>]+>)+)(.*?)((<[^>]+>)+)$', '', 1, 1, 'n') as str
FROM sentences order by nr;
NR STR
-- ------------------------------
1 My dog chases my cat
2 My dog ate my other cat
3 My cat ate my hamster
4 My hamster sleeps all the time
我有一个 table,其中有一列包含 varchar2 字符串,如下所示。
My dog chases my cat
<p>My dog ate my other cat</p>
<p><div id="abcd">My cat ate my hamster</div><p>
<p><b><div id="abcd">My hamster sleeps all the time</div></b></p>
我需要对 html 标签内的文本进行 SUBSTR。
我认为编号组是可行的方法,但我无法将结束标记放入自己的组中。这是 sql 我有:
WITH sentences AS
(
SELECT 1 as nr, 'My dog chases my cat' AS ln FROM DUAL
UNION
SELECT 2, '<p>My dog ate my other cat</p>' FROM DUAL
UNION
SELECT 3,'<p><x><div id="abcd">My cat ate my hamster</div></x></p>' FROM DUAL
UNION
SELECT 4,'<p><b><div id="abcd">My hamster sleeps all the time</div></b></p>' FROM DUAL
)
SELECT nr, regexp_replace(ln, '^((<[^>]+>)+)(.*)((<[^>]+>)+)$', 'group 1:,group 2:,group 3:,group 4:', 1, 1, 'n')
FROM sentences order by nr;
RETURNS
1 My dog chases my cat
2 group 1:<p>,group 2:<p>,group 3:My dog ate my other cat,group 4:</p>
3 group 1:<p><x><div id="abcd">,group 2:<div id="abcd">,group 3:My cat ate my hamster</div></x>,group 4:</p>
4 group 1:<p><b><div id="abcd">,group 2:<div id="abcd">,group 3:My hamster sleeps all the time</div></b>,group 4:</p>
第 4 组仅包含一个结束标签,其他结束标签 is/are 在第 3 组中,如第 3 行和第 4 行所示。我需要什么模式才能让所有结束标签都有自己的编号组 ?
虽然说过不要这样做,但对于这些特定值,您实际上只有一个字符:
WITH sentences AS
(
SELECT 1 as nr, 'My dog chases my cat' AS ln FROM DUAL
UNION
SELECT 2, '<p>My dog ate my other cat</p>' FROM DUAL
UNION
SELECT 3,'<p><x><div id="abcd">My cat ate my hamster</div></x></p>' FROM DUAL
UNION
SELECT 4,'<p><b><div id="abcd">My hamster sleeps all the time</div></b></p>' FROM DUAL
)
SELECT nr,
regexp_replace(ln, '^((<[^>]+>)+)(.*?)((<[^>]+>)+)$', 'group 1:,group 2:,group 3:,group 4:', 1, 1, 'n') as str
--------------------------------------^
FROM sentences order by nr;
没有 ?
使 .*
非贪婪,您将在第三组中包括较早的结束标签,并且只有最后的结束标签进入第 4 组,因为它有到.
NR STR
-- ------------------------------------------------------------------------------------------------------------------------
1 My dog chases my cat
2 group 1:<p>,group 2:<p>,group 3:My dog ate my other cat,group 4:</p>
3 group 1:<p><x><div id="abcd">,group 2:<div id="abcd">,group 3:My cat ate my hamster,group 4:</div></x></p>
4 group 1:<p><b><div id="abcd">,group 2:<div id="abcd">,group 3:My hamster sleeps all the time,group 4:</div></b></p>
或者只是那个组:
SELECT nr, regexp_replace(ln, '^((<[^>]+>)+)(.*?)((<[^>]+>)+)$', '', 1, 1, 'n') as str
FROM sentences order by nr;
NR STR
-- ------------------------------
1 My dog chases my cat
2 My dog ate my other cat
3 My cat ate my hamster
4 My hamster sleeps all the time