php 正则表达式，末尾有可选字符

Question

我有以下字符串

https://www.example.com/int/de

并且想匹配url末尾的语言代码，例如'de' 我用这个正则表达式

/\..*\/.*\/([^\/?]*)\/?$/gi

如果 URL 以斜杠结尾，我也想得到相同的结果

但是使用 https://www.example.com/int/de/ 我只能得到一个完整的匹配，但是该组不再匹配 'de'，尽管最后一个斜杠在正则表达式中是可选的

有人能指出我的错误吗？

Answer 1

问号matches zero or 1 character。您需要不止一个才能匹配 "de"。尝试使用 .* 或 .+ 而不是 ?.

顺便说一句，可能更易于维护的正则表达式是： /.*\/([^/]*)\/?$/gi

正则表达式表示 'match anything (.*), followed by a forward slash (\/), followed by something that is not a forward slash, zero or more times ([^/]*), followed by the optional forward slash (\/?), followed by the end of text ($)'。这样，最后一个正斜杠之前的所有字符和语言部分将在正则表达式的 'match anything' 部分匹配。请注意表示语言匹配的部分周围的括号。

Answer 2

错误并不明显，但很常见："generic" 贪心点匹配模式后跟一系列可选子模式（可以匹配空字符串的模式）。

\..*\/.*\/([^\/?]*)\/?$ 模式匹配如下： \..* 匹配一个 . 然后尽可能多地匹配任何 0+ 字符，然后开始回溯 \/ 匹配一个 / 是字符串中最右边的 / （最后一个），然后 .*\/ 再次匹配尽可能多的任何 0+ 字符，然后使引擎回溯得更远并强制它丢弃先前找到的 / 并重新匹配之前的 / 以适应字符串中另一个最右边的 / 。然后，终于来了 ([^\/?]*)\/?$，但是前面的 .*\/ 已经在 URL 中匹配到最后的 /，并且正则表达式索引在字符串末尾。因此，由于 ([^\/?]*) 可以匹配 ? 和 / 以外的 0+ 个字符，并且 \/? 可以匹配 0 / 个字符，因此它们都匹配末尾的空字符串字符串，$ 称它为一天，正则表达式引擎 returns 与第 1 组中的空值有效匹配。

去除贪心点，使用a

'~([^\/?]+)\/?$~'

见regex demo

详情

([^\/?]+) - 捕获第 1 组：? 和 /
\/? - 1 或 0 个 / 个字符
$ - 在字符串的末尾。

Answer 3

作为替代方案，您可以考虑使用 parse_url with explode and rtrim 来仅获取最后一部分。

$strings = [
    "https://www.example.com/int/de/",
    "https://www.example.com/int/de"
];
foreach ($strings as $string) {
    $parts = explode("/", rtrim(parse_url($string, PHP_URL_PATH), '/'));
    echo end($parts) . "<br>";
}

那会给你：

de
de

php 正则表达式，末尾有可选字符

php regex with optional char at the end

php

regex

regex-group