Mysql 正则表达式标记 HTML

Mysql regexp tag HTML

我有一个 HTML 代码保存在数据库中的 Message_html 字段

Dear CUSTOMER, <br />
<br />
Please be advised that the following document has been moved:<br />
Document number: D4D4D4D4D4D<br />
<br />
<table border="1">
    <th>Data</th>
    <th>Movimento</th>
    <th>Documento</th>
    <tr>
        <td>22/07/2021 15:35</td>
        <td>Juntada de contrarrazões</td>
        <td><a href="ver.aspx">ALERT - REPRESENTATIONS</a></td>
    </tr>
    <tr>
        <td>22/07/2021 15:38</td>
        <td>Juntada de certidão</td>
        <td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>
    </tr>
    <tr>
        <td>22/07/2021 15:39</td>
        <td>Juntada de alvará</td>
        <td><a href="ver.aspx">NOTICE - PERMIT</a></td>
    </tr>
</table>
<br />
<br />
If you are no longer interested in receiving the push, access the link:: <a href="push.aspx">Exit</a><br />
<br />
<b>ATTENTION: this email is generated in an automated way, please do not reply.</b>

我需要检查 table 列中是否有单词 CERTIFICATE

<td><a href="ver.aspx">CERTIFICATE</a></td>

MYSQL

中使用的正则表达式
SELECT REGEXP_INSTR('<td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>', '>[^<td><a*]*CERTIFICATE*[</a></td>]') AS verify;

    REGEXP_INSTR(k.Message_html, '>[^<td><a*]*CERTIFICATE*[</a></td>]')

Cannot find record

SELECT
*
FROM table as k

Where

WHERE REGEXP_INSTR(k.Message_html, concat('>[^<td><a*]*','CERTIFICATE,'*[</a></td>]')) > 0;

并且正则表达式在 table

中找不到单词

因为有CERTIFICATE这个词

单词内容table

word_id word
1 SUBJECT
2 DECISION
3 ORDER
4 SENTENCE
5 PETITION
6 CERTIFICATE
7 AMENDMENT TO THE INITIAL PETITION
8 NOTIFICATION - NOTIFICATION
9 EXTRACT
10 PETITION - PETITION
11 NOTIFICATION
12 MANIFESTATION
13 OTHER PARTS
14 REPRESENTATIONS

如果您只是测试字符串是否与正则表达式匹配,则不需要使用 REGEXP_INSTR()。使用 RLIKE.

SELECT *
FROM YourTable AS k
WHERE k.Message_html RLIKE '<td><a [^<]*CERTIFICATE'

[^<]* 将匹配任何不是另一个标签开头的内容,因此如果 <a> 在其文本中包含 CERTIFICATE,它将匹配。

加入单词table:

SELECT * 
FROM Table1 AS k 
JOIN words p ON k.Message_html RLIKE CONCAT('<td><a [^<]*', p.word);

DEMO

这会找到您的证书

SELECT REGEXP_INSTR('<td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>'
, '(<td><a href="ver.aspx">).*CERTIFICATE.*(</a></td>)') AS verify;

    SELECT REGEXP_INSTR('Dear CUSTOMER, <br />
<br />
Please be advised that the following document has been moved:<br />
Document number: D4D4D4D4D4D<br />
<br />
<table border="1">
    <th>Data</th>
    <th>Movimento</th>
    <th>Documento</th>
    <tr>
        <td>22/07/2021 15:35</td>
        <td>Juntada de contrarrazões</td>
        <td><a href="ver.aspx">ALERT - REPRESENTATIONS</a></td>
    </tr>
    <tr>
        <td>22/07/2021 15:38</td>
        <td>Juntada de certidão</td>
        <td><a href="ver.aspx">SUCCESS - CERTIFICATE</a></td>
    </tr>
    <tr>
        <td>22/07/2021 15:39</td>
        <td>Juntada de alvará</td>
        <td><a href="ver.aspx">NOTICE - PERMIT</a></td>
    </tr>
</table>
<br />
<br />
If you are no longer interested in receiving the push, access the link:: <a href="push.aspx">Exit</a><br />
<br />
<b>ATTENTION: this email is generated in an automated way, please do not reply.</b>', '(<td><a href="ver.aspx">).*CERTIFICATE.*(</a></td>)') AS verify;