正则表达式:如果一个组不存在,则将其视为可选组,但如果存在,则仅捕获该组之前的前一个组
Regular Expression: Treat a group as optional if its not present but if present only capture the previous group before that
我有一个要求,我需要使用 Regex 来解析来自用户的查询。
For e.g. User could search for links with query format like
link to <keyword> from <person name>
link to <keyword> from <person name> shared <time>
例如
link to connect form from sandeep agarwal => keyword=connect form, person-name=sandeep agarwal
link to sharepoint ppt from mathews => keyword=sharepoint ppt, person-name=mathews
link to sharepoint design from Gronvik yesterday => keyword=sharepoint design, person-name=Gronvik, time=yesterday
我在上面提到了预期的捕获组值。
My Regex:
"Link to (?[a-z ]+) from (?[]+)(?:shared)(? [a-z]+)"
这是我对以上 3 个问题的回复
Match 1
Full match = link to connect link from sandeep agarwal
Group `keyword` = connect form
Group `name`= sandeep agarwal
Match 2
Full match = link to sharepoint git from sapna
Group `keyword` = sharepoint ppt
Group `name`= mathews
Match 3 - **This is where things go wrong**
Full match = link to sharepoint git from sapna grover shared yesterday
Group `keyword` = sharepoint design
Group `name`= Gronvik shared yesterday
In the above mentioned 3rd response, I'm getting the "Gronvik shared
yesterday" as group "name" but the ideal scenario would be
name=Gronvik and time="yesterday" I have tried many approaches from
positive lookahead to lookbehind but something or the other scenarios
starts breaking.
The keyword "shared" might not be present everytime but when it is
present, my "name" group should capture the name till shared(excluding
it) and group "time" should capture time only if "shared" is present
in the query. It would be really helpful if someone could point out
the right direction.
link to (.*?) from (.*?)( shared (.*))?$
对惰性中继器使用.*?
(惰性 = 非贪婪)
您可以使用
(?i)^Link\s+to\s+(?<keyword>[a-z ]+) from (?<name>.*?)(?:\s+shared\s+(?<time>[a-z]+))?$
参见regex demo。
详情
(?i)
- 不区分大小写的标志
^
- 字符串的开头(或行,如果 m
多行选项打开)
Link to
- 文字
(?<keyword>[a-z ]+)
- 组 "keyword":1+ 个字母或空格
from
- 文字
(?<name>.*?)
- 组 "name":任何 0+ 个字符,尽可能少
(?:\s+shared\s+(?<time>[a-z]+))?
- 一个可选的序列
\s+
- 1+ 个空格
shared
- 文字子串
\s+
- 1+ 个空格
(?<time>[a-z]+)
- 组 "time":1+ 个字母
$
- 字符串/行结束。
我有一个要求,我需要使用 Regex 来解析来自用户的查询。
For e.g. User could search for links with query format like
link to <keyword> from <person name>
link to <keyword> from <person name> shared <time>
例如
link to connect form from sandeep agarwal => keyword=connect form, person-name=sandeep agarwal
link to sharepoint ppt from mathews => keyword=sharepoint ppt, person-name=mathews
link to sharepoint design from Gronvik yesterday => keyword=sharepoint design, person-name=Gronvik, time=yesterday
我在上面提到了预期的捕获组值。
My Regex:
"Link to (?[a-z ]+) from (?[]+)(?:shared)(? [a-z]+)"
这是我对以上 3 个问题的回复
Match 1
Full match = link to connect link from sandeep agarwal
Group `keyword` = connect form
Group `name`= sandeep agarwal
Match 2
Full match = link to sharepoint git from sapna
Group `keyword` = sharepoint ppt
Group `name`= mathews
Match 3 - **This is where things go wrong**
Full match = link to sharepoint git from sapna grover shared yesterday
Group `keyword` = sharepoint design
Group `name`= Gronvik shared yesterday
In the above mentioned 3rd response, I'm getting the "Gronvik shared yesterday" as group "name" but the ideal scenario would be name=Gronvik and time="yesterday" I have tried many approaches from positive lookahead to lookbehind but something or the other scenarios starts breaking.
The keyword "shared" might not be present everytime but when it is present, my "name" group should capture the name till shared(excluding it) and group "time" should capture time only if "shared" is present in the query. It would be really helpful if someone could point out the right direction.
link to (.*?) from (.*?)( shared (.*))?$
对惰性中继器使用.*?
(惰性 = 非贪婪)
您可以使用
(?i)^Link\s+to\s+(?<keyword>[a-z ]+) from (?<name>.*?)(?:\s+shared\s+(?<time>[a-z]+))?$
参见regex demo。
详情
(?i)
- 不区分大小写的标志^
- 字符串的开头(或行,如果m
多行选项打开)Link to
- 文字(?<keyword>[a-z ]+)
- 组 "keyword":1+ 个字母或空格from
- 文字(?<name>.*?)
- 组 "name":任何 0+ 个字符,尽可能少(?:\s+shared\s+(?<time>[a-z]+))?
- 一个可选的序列\s+
- 1+ 个空格shared
- 文字子串\s+
- 1+ 个空格(?<time>[a-z]+)
- 组 "time":1+ 个字母
$
- 字符串/行结束。