正则表达式在两组之间使用，其中第二组是可选的

Question

我有以下字符串：

Sally: Hello there #line:34de2f
Bob: How are you today?

这些字符串由三部分组成...

“姓名”； Sally: 和 Bob:
“正文”； Hello there 和 How are you today?
一个可选的“行标识符”； #line:34de2f

我想使用正则表达式获取“名称”和可选的“行标识符”之间的“文本”。

这似乎是负面前瞻的目的：

(?<=:).*?(?!#line:.*)$

但是这个仍然捕获“行标识符”。

以下有效，但我不想实际捕获“行标识符”：

(?<=:).*?(#line:.*)?$

Answer 1

您可以尝试使用

(?<=:\s).*?(?=\s*#line:.*|$)

参见 this regex demo。详情：

(?<=:\s) - 紧接 : 和空格
.*? - 除换行字符外的任何 0 个或多个字符，尽可能少
(?=\s*#line:.*|$) - 紧跟 0+ 个空格、#line: 字符串或字符串结尾的位置。

您也可以使用

:\s*(.*?)(?:\s*#line:.*)?$

见regex demo。获取第 1 组中的内容。

详情

:\s* - 一个冒号，然后是 0 个或多个空格
(.*?) - 捕获第 1 组：除换行字符外的任何零个或多个字符，尽可能少
(?:\s*#line:.*)? - 一个可选的序列
- \s* - 0+ 个空格
- #line: - 文字 #line: 字符串
- .* - 除换行字符外的任何零个或多个字符，尽可能多
$ - 字符串结尾。

Answer 2

另一个解决方案（适用于 Python）：

\w+:\s+?(.+)?\s+?#?.*?

示例：

import re

tst1 = "Sally: Hello there #line:34de2f"
res1 = re.search(r"\w+:\s+?(.+)?\s+?#?.*?", tst1)
res1.groups(1) # ('Hello there',)

tst2 = "Bob: How are you today?"
res2 = re.search(r"\w+:\s+?(.+)?\s+?#?.*?", tst2)
res2.groups(1) # ('How are you',)

Answer 3

^([^:]*)[:]([^#]*)(?!line.*)

This 也可能适合您：

^ - 行首
([^:]*.) - 在名称
[:] - 冒号（这可以简化为 :）
[^#] - 不是散列符号（在捕获组内并重复）([^#]*)
(?!line.*) - 否定前瞻。

正则表达式在两组之间使用，其中第二组是可选的

Regex to consume between two groups where the second group is optional

regex

regex-negation

regex-group

regex-greedy