如果锚点 href 值以“#”开头，Perl 正则表达式将跳过

Question

我正在使用正则表达式从字符串中解析 URL，如果锚点 href 值以“#”开头，我想跳过。

从下面的字符串开始，我想跳过这个按原样显示“<a href="#C4">https://www.google.com</a>”

my $text = qq~ <a href="#C4">https://www.google.com</a>
    
    <a href="">https://www.google.com</a>
    
    content1 <video> https://google.com/ </video>my content2<video>https://google.com/</video>~;

我正在为此使用此正则表达式，但未获得所需的输出：

$text =~ s/(^|\s|\>|\()(?:<a(?:[^>]*)\>)?((https|ftp):\/\/)([^\r\n<>]*)(?:\<\/a\>)?/<a href="">\<\/a\>/gi;

上面的正则表达式 return 输出：

<a href="https://www.google.com">https://www.google.com</a>

<a href="https://www.google.com">https://www.google.com</a>

content1 <video> <a href="https://google.com">https://google.com</a> </video>my content2<video><a href="https://google.com">https://google.com</a></video>

它不会跳过第一个锚点，因为它在 href 的开头有“#”。请帮忙。

Answer 1

我能够使用 (*SKIP)(*FAIL) 使其工作：当 <a 中有 href="# 时，匹配失败并且不会回溯。有关详细信息，请参阅 perlre。

$text =~ s{(^|\s|>|\()
           (?:<a[^>]+href=['"]?\#.*?</a>(*SKIP)(*FAIL)
             |<a[^>]*>|)      # If there wasn't href="#, work the old way.
           ((?:https|ftp)://) #2
           ([^\r\n<>]*)       #3
           (?:</a>)?
          }{<a\ href=""></a>}xgi;

我还使用 s{}{} 来避免反斜杠和 /x 以使其更具可读性并打开注释。

如果锚点 href 值以“#”开头，Perl 正则表达式将跳过

Perl regex to skip if anchor href value start with "#"

regex

perl