复杂的正则表达式、PEG 还是多次通过？

Question

我正在尝试从以下示例中提取一些数据：

名称 789, 10-mill 12-27b
制造商 XY-2822，10-mill，17-25b
其他制造商 16b 部分
另一个制造商 FER M9000，11-mill，11-40
18b 部分
制造商 11-31, 10-mill
制造商 1x 或 2x；最大尺寸 1x (34b), 2x (38/24b)
制造商 REC6 15/18/26b。正方形。
生产商 FC-40 11-13-16-19-22-25-27-30-34b

我希望我的结果分别是：

12、27
17、25
16
11, 40
18
11-31
34,38,24（可选，只提供后两个即可）
15、18、26
11、13、16、19、22、25、27、30、34

我很乐意多次使用表达式语法来执行此操作，但我认为这不会真正有帮助。

我在使用前瞻和后视来获取数据并排除诸如“11-mill”和 "XY-2822" 之类的内容时遇到了问题。我发现发生的事情是我能够排除那些匹配项，但最终会截断其他匹配项的好结果。

解决此问题的最佳方法是什么？

我当前的正则表达式是 /(?:(\d+)[b\b\/-])([b\d\b]*)[^a-z]/i

捕获字母 'b'（没关系）但在最后一个例子中没有捕获 34b

Answer 1

也许是这样：

(?<=\d-)\d+|\d+(?=-\d+)|\d+(?=(?:\/\d+)*b)

https://regex101.com/r/nR3eS9/1

Answer 2

不确定你的确切内容 requirements/formats 但你可以试试这个：

/(?:\G(?!^)[-\/]|^(?:.*[^\d\/-])?)\K\d++(?![-\/]\D)/

http://rubular.com/r/WJqcCNe2pr

详情：

# two possible starts:
(?: # next occurrences
    \G    # anchor for the position after the previous match
    (?!^) # not at the start of the line
    [-\/]
  | # first occurrence
    ^
    (?:.*[^\d\/-])? # (note the greedy quantifier here,
                    #  to obtain the last result of the line)
)

\K # discards characters matched before from the whole match
\d++ # several digits with a possessive quantifier to forbid backtracking
(?![-\/]\D) # not followed by an hyphen of a slash and a non-digit

如果将 (?:.*[^\d\/-])? 替换为 [^-\d\/\n]*+(?>[-\d\/]+[^-\d\/\n]+)* （如果逐行工作，请删除 \n。） .此更改的目标是限制回溯（即逐个原子组发生，而不是第一个版本逐个字符发生）。

也许，您可以用这种正先行代替负先行：(?=[-\/]\d|b|$)

另一个版本here。

复杂的正则表达式、PEG 还是多次通过？

Complex Regular Expression, PEG, or Multiple Passes?

regex

parsing

string-parsing