如果是相对路径，将域添加到 <img> src 属性值

Question

我有一个文本变量，其中包含多个具有相对或绝对路径的图像。我需要检查 src 属性是否以 http 或 https 开头，然后忽略它，但如果它以 / 或类似 abc/ 开头，则在前面添加一个基数 url.

我试过如下：

<?php
$html = <<<HTML
<img src="docs/relative/url/img.jpg" />
<img src="/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />
HTML;

$base = 'https://example.com/';

$pattern = "/<img src=\"[^http|https]([^\"]*)\"/";
$replace = "<img src=\"" . $base . "${1}\"";
echo $text = preg_replace($pattern, $replace, $html);

我的输出是：

<img src="https://example.com/ocs/relative/url/img.jpg" />
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

这里的问题：我有 99% 的结果是正确的，但是当 src 属性以类似 docs/ 的东西开始时，它的第一个字母被截断了。（请先检查输出中的 img src）

我需要的输出是：

<img src="https://example.com/docs/relative/url/img.jpg" /><!--check this and compare with current result, you will get the difference -->
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

谁能帮我改一下

Answer 1

以下模式将寻找不以 http 或 https 开头的 src 属性。然后对于以正斜杠开头的相对路径，前导斜杠将在将 $base 字符串添加到 src 值之前被删除。

代码：(Demo)

$base = 'https://example.com/';
echo preg_replace('~ src="(?!http)\K/?~', $base, $html);

输出：

<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

细分：

~           #starting pattern delimiter
 src="      #match space, s, r, c, =, then "
(?!http)    #only continue matching if not https or http
\K          #forget any previously matched characters so they are not destroyed by the replacement string
/?          #optionally match a forward slash
~           #ending pattern delimiter

至于你的模式，/<img src=\"[^http|https]([^\"]*)\"/:

[^http|https] 实际上意味着“匹配一个不在此列表中的字符：|、h、t、p 和 s。可以简化为[^|hpst]，因为“否定字符class”中列出字符的顺序无关紧要，重复字符没有意义。所以你看，[^...]不是你所说的“一个字符串以某物或某物开头”。
捕获子字符串中所有剩余的字符直到下一个双引号以在替换中再次使用它是不必要的。这就是为什么我使用 \K 来确定应该注入 $base 而不是 ([^\"]*).

此外，在处理有效的 HTML 文档时，我总是推荐 DOM 解析器的稳定性。您可以使用带有 XPath 的 DOMDocument 来定位限定元素并修改 src 属性而不用正则表达式。

代码：(Demo)

$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//img[not(starts-with(@src, 'http'))]") as $node) {
    $node->setAttribute('src', $base . ltrim($node->getAttribute('src'), '/'));
}
echo $dom->saveHTML();

如果是相对路径，将域添加到 <img> src 属性值

Add domain to <img> src attribute value if a relative path

php

regex

dom

replace

src