如何在积极回顾后排除事件？

Question

假设我有以下 markdown 列表项：

- [x] Example of a completed task.
- [x] ! Example of a completed task.
- [x] ? Example of a completed task.

我有兴趣使用 regex 解析该项目并提取以下组捕获：

</code>：当符号<code>x介于

[

]

</code>：括号<code>[和]

x

</code>：<code>[x]

!

</code>：<code>[x]

?

</code>：<code>[x] 后面没有修饰符的文本，例如 [x] This is targeted.
</code>：<code>[x] !
</code>：<code>[x] ?

在使用在线解析器进行大量反复试验后，我得出以下结论：

((?<=x)\]|\[(?=x]))|((?<=\[)x(?=\]))|((?<=\[x\]\s)!(?=\s))|((?<=\[x\]\s)\?(?=\s))|((?<=\[x\]\s)[^!?].*)|((?<=\[x\]\s!\s).*)|((?<=\[x\]\s\?\s).*)

为了使上面的 regex 更具可读性，这些是一一列出的捕获组：

</code>: <code>((?<=x)\]|\[(?=x]))
</code>: <code>((?<=\[)x(?=\]))
</code>: <code>((?<=\[x\]\s)!(?=\s))
</code>: <code>((?<=\[x\]\s)\?(?=\s))
</code>: <code>((?<=\[x\]\s)[^!?].*)
</code>: <code>((?<=\[x\]\s!\s).*)
</code>: <code>((?<=\[x\]\s\?\s).*)

这很可能不是最好的方法，但至少它似乎捕捉到了我想要的东西：

我想扩展 regex 以捕获 markdown table 中的行看起来像这样：

|       | Task name                               |    Plan     |   Actual    |      File      |
| :---- | :-------------------------------------- | :---------: | :---------: | :------------: |
| [x]   | Task one with a reasonably long name.   | 08:00-08:45 | 08:00-09:00 |  [[task-one]]  |
| [x] ! | Task two with a reasonably long name.   | 09:00-09:30 |             |  [[task-two]]  |
| [x] ? | Task three with a reasonably long name. | 11:00-13:00 |             | [[task-three]] |

更具体地说，我对与上面相同的组捕获感兴趣，但我想排除 table 网格（即 |）。因此，组 </code> 到 <code> 应该保持不变，但是组 </code> 到 <code> 应该捕获文本，不包括 |，例如选择如下：

您对我如何调整有任何想法吗，例如，组 </code> 的正则表达式以排除 <code>|。我没完没了地尝试了各种否定（例如 [^\|]）。我正在使用 Oniguruma regular expressions.

Answer 1

受Wiktor答案的启发，检查以下正则表达式，它很短

(?:\G(?<!\A)\||(?:\[x]\s[?!]?\s*\|?))\K([^|\n]*)

上面的解释

1.\G(?!\A)\|

\G asserts position at the end of the previous match or the start of the string for the first match. Negative Lookbehind (?!\A)

\A asserts position at start of the string

| matches the character |

(?:\[x]\s[?!]?\s*\|?)

Non-capturing group. That matches [x], \s (space), [?|!] (zero or 1) followed by \s* (zero or more) and a | (zero or one)

\K

\K resets the starting point of the reported match.

([^|\n]*)

All characters except | or \n (newline) matches previous token zero or unlimited times.

Answer 2

你可以使用

((?<=x)]|\[(?=x]))|((?<=\[)x(?=]))|((?<=\[x]\s)!(?=\s))|(?<=\[x]\s)(\?)(?=\s)|(?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|)

参见regex101 PCRE and a Ruby (Onigmo/Oniguruma) demos。

添加了什么？ (?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|)部分：

(?: - 非捕获组的开始（此处为自定义边界，我们将匹配...）
- \G(?!\A)\| - 上一场比赛的结尾和一个 | 字符（即 | 必须紧跟在上一场比赛之后），
- |(?<=\[x]\s[?!\s]\s\|) - 或紧接 [x] + 空格 + ?、! 或空格 + 空格和 | 字符
) - 小组结束
\K - 匹配重置运算符，从整个匹配内存缓冲区中删除到目前为止匹配的文本
([^|\n]*) - 除 | 和换行字符
(?=\|) - | 字符必须立即出现在当前位置的右侧。

如何在积极回顾后排除事件？

How to exclude occurrences after a positive lookbehind?

regex

syntax-highlighting

regex-group

visual-studio-code