捕获字符串或字符串的一部分，直到某个字符

Question

我有以下文字：

    https://whosebug.com | https://google.com | first text to match | 
    https://randomsite.com | https://randomurl2.com | text | https://randomsite.com | 
    https://randomsite.com | https://randomsite.com |

我正在尝试匹配不是 url 的字符串的第一个序列，直到 |。在此示例中，我希望正则表达式匹配：

    https://whosebug.com | https://google.com | first text to match |

目前我有这个：

/^(.*)[|]\s(\b\w*\b)?\s[|]/gm

但是，这仅在第一个不是 url 的序列只是一个没有空格的字符串时才有效。如果 first text to match 只是 first，那么它将匹配。

期望的结果是匹配两种情况，字符串不带空格和匹配字符串带空格。

编辑：有时我还需要贪心匹配，正则表达式会匹配所有内容，直到 text |.

Answer 1

您想包含空格

/^(.*)[|]\s(\b(\w|\s)*\b)?\s[|]/gm

如果你想在文本中允许各种特殊字符（包括换行），你可以试试这个方法：

\|\s*((?!\s*\w+:\/\/)[^|]+?)\s\|

https://regex101.com/r/2OOKky/1

如果你想在文本中允许各种特殊字符（但是没有新行），你可以试试这个方法：

(?:^|\|)(?:(?!$)\s)+((?!\s*\w+:\/\/)(?:(?!$)[^|])+?)(?:(?!$)\s)*\|

https://regex101.com/r/HS3bra/1

Answer 2

如果你必须至少匹配前导 url:

\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)+[^\s|][^|\r\n]*\|

说明

\A 字符串开头
[\s\S]*?尽可能少地匹配任何字符
\b\K一个词界，那就忘记匹配到什么为止了
(?:https?://\S*\h*\|\h*)+ 匹配一个或多个 url 后跟 | 之间的可选空格
[^\s|] 匹配除管道之外的非空白字符
[^|\r\n]* 可选择匹配除竖线或换行符之外的任何字符，然后匹配最后一个竖线

Regex demo

如果没有前导 urls 也可以：

\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)*[^\s|][^|\r\n]*\|

Regex demo

例子

$re = '~\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)+[^\s|][^|\r\n]*\|~';
$str = '    https://whosebug.com | https://google.com | first text to match | 
    https://randomsite.com | https://randomurl2.com | text | https://randomsite.com | 
    https://randomsite.com | https://randomsite.com |';

if(preg_match($re, $str, $matches)) {
    echo $matches[0];
}

输出

https://whosebug.com | https://google.com | first text to match |

捕获字符串或字符串的一部分，直到某个字符

Capture a string or part of a string up until a certain character

php

regex