将给定的相对 url 转换为绝对 url

Question

我需要将给定 html 文本中的几个给定相对 URL 转换为绝对 URL。

html 文本将与相对和绝对 url 混合，我需要结果 html 文本，它应该只包含符合以下规则的绝对 url。

原始html文本包含相对和绝对 url 的混合
需要将/test/1.html转换成https://www.example.com/test/1.html
它应该忽略具有绝对 url（.com 和 .de）的实例，例如 http://www.example.com/test/xxx.html、https://www.example.com/test/xxx.html、https://www.example.de/test/xxx.html、http://www.example.de/test/xxx.html

我知道使用 preg_replace 执行此操作的最佳方法，因为我正在使用 PHP，并且我尝试了以下代码。

$server_url = "https://www.example.com";
$html = preg_replace('@(?<!https://www\.example\.com)(?<!http://www\.example\.com)(?<!https://www\.example\.de)(?<!http://www\.example\.de)/test@iU', $server_url.'/test', $html);

但是，这并没有给出预期的结果，而是转换了所有 /test 链接，包括现有的绝对 URL。所以基本上有些网址最终会像 http://www.example.dehttp://www.example.com/test/xxx.html.

我不擅长regex，请帮我找到合适的regex以获得想要的结果。

Answer 1

这应该匹配 root-相对 urls:

^(\/[^\/]{1}.*\.html)$

您想要的 URL 将在 </code></p> 中提供 <p><a href="https://regex101.com/r/E1evez/2" rel="nofollow noreferrer">https://regex101.com/r/E1evez/2</a></p> <hr> <pre><code><?php $urls = [ '/test/1.html', 'http://www.example.com/test/xxx.html', 'https://www.example.de/test/xxx.html', '/relative/path/file.html' ]; foreach( $urls as $url ) { if( preg_match( '/^(\/[^\/]{1}.*\.html)$/', $url ) ) { echo 'match: '.$url.PHP_EOL; } else { echo 'no match: '.$url.PHP_EOL; } }

输出：

match: /test/1.html
no match: http://www.example.com/test/xxx.html
no match: https://www.example.de/test/xxx.html
match: /relative/path/file.html

Answer 2

如果所有网址都以正斜杠开头，您可以使用：

(?<!\S)(?:/[^/\s]+)+/\S+\.html\S*

说明

(?<!\S) 断言左边的不是非空白字符
(?:/[^/\s]+)+ 重复 1+ 次匹配 /，然后不匹配 / 或使用 negated character class
/\S+ 匹配 / 和 1+ 次非空白字符
\.html\S* 匹配示例数据中的 .html 和 0+ 次非空白字符

Regex demo

如果您还想匹配 /1.html，您可以使用将量词更改为 )* 而不是 )+

要匹配比 .html 更多的扩展名，您可以指定允许匹配的内容，如 \.(?:html|jpg|png) 或者使用字符 class \.[\w-()] 并添加您允许的内容匹配。

将给定的相对 url 转换为绝对 url

Convert given relative urls to absolute urls

php

regex

preg-replace