使用标签的可见文本替换限定 <a> 的 href 值

Question

我有一个包含很多 URL 的大字符串，我需要替换匹配的 URL：

<a href="../plugins/re_records/somefile.php?page=something&id=X">important_name</a>

（其中 X 是任意整数，important_name 是任意字符串）与：

<a href="/map/important_name">important_name</a>

我正在使用 preg_match_all() 来匹配所有 URLs:

preg_match_all('/\/plugins\/re\_records\/somefile\.php\?page\=something\&id\=*(\d+)/', $bigString, $matches, PREG_OFFSET_CAPTURE);

问题是我不明白如何从超链接的可见文本中获取 important_name 以在 URL 匹配后成为新 url 的一部分。

使用 preg_match_all() 是个好主意吗？

Answer 1

如果我没理解错的话，你是想得到匹配的 important_name?

然后加上括号就可以在$matches.

<?php
$s = '<a href="../plugins/re_records/somefile.php?page=something&id=123">important_name</a>';

preg_match_all('/\<a href\=\"\.\.\/plugins\/re\_records\/somefile\.php\?page\=something\&id\=*(\d+)\"\>(.*?)\<\/a\>/', $s, $matches, PREG_OFFSET_CAPTURE);

var_dump($matches[2][0][0])
?>

Answer 2

不要使用正则表达式。使用 DOMDocument。它们专门用于解析 HTML/XML 文档。

获取所有锚标记元素，检查 href 属性中的值并使用 setAttribute() 方法相应地更改属性。

片段：

<?php

libxml_use_internal_errors(true); // to disable warnings if HTML is not well formed 
$o = new DOMDocument();
$o->loadHTML('<a href="../plugins/re_records/somefile.php?page=something&id=45">important_name</a>');

foreach($o->getElementsByTagName('a') as $anchor_tag){
    $href = $anchor_tag->getAttribute('href');
    if(strpos($href,'/plugins/re_records/somefile.php?page=something&id=') !== false){
        $anchor_tag->setAttribute('href','/map/'.$anchor_tag->nodeValue);
    }
}

echo $o->saveHTML();

演示： https://3v4l.org/5GPXA

Answer 3

一定要养成使用合法 DOM 解析器解析 HTML 的习惯。使用正则表达式最终会让你头疼。当 DOM 解析器使您失败时，然后考虑使用正则表达式。
我更喜欢使用 XPath 过滤已解析的文档，因为表达式可以非常强大和灵活。
要在将字符串加载到 DOMDocument 时消除任何警告，请调用 libxml_use_internal_errors(true);。这将使所有警告静音。
使用 LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED flags 来省略 <DOCTYPE>、<HTML> 和 <BODY> 您不需要 want/need 的标签。
starts-with() 会做得很好，因为您不尝试从查询字符串的末尾提取 ID 号。
不要被输出中编码的 & 拖延 -- it's a good thing / part of a more modern standard.

代码：(Demo)

$html = <<<HTML
<div>
    <p> some text <a href="../plugins/re_records/somefile.php?page=something&id=345">find_me_1</a></p>
    <br>
    <a href="../plugins/re_records/somefile.php?page=something&id=99">find_me_2</a>
    <div>
        <div>
            <a href="example.com?page=something&id=55">don't even think about it!</a>
            <a href="../plugins/re_records/somefile.php?page=something&id=90210">find_me_3</a>
        </div>
    </div>
</div>
HTML;

$hrefStartsWith = '../plugins/re_records/somefile.php?page=something&id=';

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[starts-with(@href, '$hrefStartsWith')]") as $a) {
    $a->setAttribute('href', '/map/' . $a->nodeValue);
}
echo $dom->saveHTML();

输出：

<div>
    <p> some text <a href="/map/find_me_1">find_me_1</a></p>
    <br>
    <a href="/map/find_me_2">find_me_2</a>
    <div>
        <div>
            <a href="example.com?page=something&amp;id=55">don't even think about it!</a>
            <a href="/map/find_me_3">find_me_3</a>
        </div>
    </div>
</div>

使用标签的可见文本替换限定 <a> 的 href 值

Replace the href value of qualifying <a>'s using the tag's visible text

php

anchor

replace

href

html-parsing