regex/preg_replace 提取部件号（子字符串）

Question

我对 RegEx 不是很满意。

用例

我使用了三个变量，即$url、$pattern和$replacement，并打算按如下方式使用它们：

$url = $node->attr("href");

$resource = ExtractResourceWithoutHtmlExtension($url); // This is jus to abstract the stripping off of the prepended path and cutting the `.html` (see Edit 2 & 3 below).

$pattern =  ...
$replacement = ; // Not very sure of this value

$partno = preg_replace($pattern, replacement, $resource);

echo '"'.$partno.'";"'.$node->attr("title").'";"'.$url.'"'."\n";

部件号和资源方案映射（字符串）

大部分时间

35000-0295 => 将产品指定为 slug-35000-0295

27021-0012 => 将产品指定为 slug-27021-0012

或很少

38811 => 将产品指定为 slug-38811

最后但并非最不重要的（边缘情况 => 没有要提取的内容）
如果零件号不可用，资源子字符串将简单地为

designation-of-the-products-as-slug

我仍然更喜欢 RegEx 解决方案，因为构成零件号的段内的数字长度可能会有所不同。

问题

我应该分配给 $pattern 和 $replacement 什么？

编辑1（供参考）

子字符串 designation-of-the-products-as-slug 是可变的 ~~和 path/to/ 可以是任意深度~~。

编辑2（供参考）

转念一想，我意识到没有必要对整个 URL 路径使用 RegEx：http://path/to/ 可以使用 parse_url、explode 和 array_pop。相应地编辑了我的 post.

编辑3（供参考）

复杂性也可以通过切割不可变的尾随子串来降低 .html。比照。 @bloodyKnuckles 的评论如下。 Post 相应编辑。

Answer 1

首先，我会结合使用 parse_url and pathinfo to strip off extraneous bits from the string, then use preg_filter 和 /.*?(\d+[\d-]*)$/ 之类的正则表达式来获取最后的数字块以及可选的后续连字符和数字。

示例：

$urls = [
    "http://example.com/path/to/designation-of-the-products-as-slug-35000-0295.extension",
    "http://example.com/path/to/designation-of-the-products-as-slug-35000.html",
    "http://example.com/path/to/designation-of-the-products-as-slug.ext?foo=bar.baz"
];

$regex = '/.*?(\d+[\d-]*)$/';

foreach ($urls as $url) {
    $resource = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
    echo preg_filter($regex, '', $resource), "\n";
}

输出：

35000-0295
35000

regex/preg_replace 提取部件号（子字符串）

regex/preg_replace to extract the part number (substring)

php

regex

preg-replace

用例

部件号和资源方案映射（字符串）

问题

编辑1（供参考）

编辑2（供参考）

编辑3（供参考）

示例：

输出：