正则表达式:如果一个组不存在,则将其视为可选组,但如果存在,则仅捕获该组之前的前一个组

Regular Expression: Treat a group as optional if its not present but if present only capture the previous group before that

我有一个要求,我需要使用 Regex 来解析来自用户的查询。

For e.g. User could search for links with query format like

link to <keyword> from <person name>
link to <keyword> from <person name> shared <time>

例如

link to connect form from sandeep agarwal => keyword=connect form, person-name=sandeep agarwal
link to sharepoint ppt from mathews => keyword=sharepoint ppt, person-name=mathews 
link to sharepoint design from Gronvik yesterday => keyword=sharepoint design, person-name=Gronvik, time=yesterday

我在上面提到了预期的捕获组值。

My Regex:

"Link to (?[a-z ]+) from (?[]+)(?:shared)(? [a-z]+)"

这是我对以上 3 个问题的回复

Match 1
Full match = link to connect link from sandeep agarwal
Group `keyword` = connect form
Group `name`= sandeep agarwal   

Match 2
Full match = link to sharepoint git from sapna
Group `keyword` = sharepoint ppt
Group `name`= mathews           

Match 3 - **This is where things go wrong**
Full match = link to sharepoint git from sapna grover shared yesterday
Group `keyword` = sharepoint design
Group `name`= Gronvik shared yesterday

In the above mentioned 3rd response, I'm getting the "Gronvik shared yesterday" as group "name" but the ideal scenario would be name=Gronvik and time="yesterday" I have tried many approaches from positive lookahead to lookbehind but something or the other scenarios starts breaking.

The keyword "shared" might not be present everytime but when it is present, my "name" group should capture the name till shared(excluding it) and group "time" should capture time only if "shared" is present in the query. It would be really helpful if someone could point out the right direction.

link to (.*?) from (.*?)( shared (.*))?$

对惰性中继器使用.*?(惰性 = 非贪婪)

您可以使用

(?i)^Link\s+to\s+(?<keyword>[a-z ]+) from (?<name>.*?)(?:\s+shared\s+(?<time>[a-z]+))?$

参见regex demo

详情

  • (?i) - 不区分大小写的标志
  • ^ - 字符串的开头(或行,如果 m 多行选项打开)
  • Link to - 文字
  • (?<keyword>[a-z ]+) - 组 "keyword":1+ 个字母或空格
  • from - 文字
  • (?<name>.*?) - 组 "name":任何 0+ 个字符,尽可能少
  • (?:\s+shared\s+(?<time>[a-z]+))? - 一个可选的序列
    • \s+ - 1+ 个空格
    • shared - 文字子串
    • \s+ - 1+ 个空格
    • (?<time>[a-z]+) - 组 "time":1+ 个字母
  • $ - 字符串/行结束。