正则表达式 python - 仅当换行符后跟数字或特殊字符和 space 时才匹配换行符

Question

我一直在尝试在 Python 中找出这个正则表达式，但它没有产生预期的结果。

我加载了一个文本文件，格式为：

"18 75 19\n!dont split here\n! but split here\n* and split here"

我想获得以下输出：

['18 75 19\n!dont split here',
 '! but split here',
 '* and split here']

我正在尝试通过 1) 一个新行后跟一个数字，或 2) 一个新行后跟一个特殊字符来拆分我的字符串仅当它是后跟 space（例如“！但在这里拆分”，而不是“！不要在这里拆分”）。

这是我目前的情况：

re.split(u'\n(?=[0-9]|([`\-=~!@#$%^&*()_+\[\]{};\'\:"|<,./<>?])(?= ))', str)

这很接近，但还没有。这是它产生的输出：

['18 75 19\n!dont split here', '!', '! but split here', '*', '* and split here']

它单独错误地匹配了特殊字符：'!'和 '*' 有自己的元素。正则表达式中有两个先行运算符。

如果你能帮助确定我可以用这个正则表达式改变什么，让它不匹配单个特殊字符，而只匹配后面跟着整行的特殊字符，我将不胜感激。

我也愿意接受其他选择。如果有不涉及两次前瞻的更好方法，我也有兴趣了解解决此问题的其他方法。

谢谢！

Answer 1

您的正则表达式确实有效，问题出在您在 [`\-=~!@#$%^&*()_+\[\]{};\'\:"|<,./<>?] 附近的捕获组。来自 manual:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list

如果删除该字符 class 周围的 ()，您将得到预期的结果。

请注意，您不需要在该交替中使用 (?= )，因为它已经是前瞻的一部分，您可以只使用 </code> (space) .此外，您可能会发现将符号写为否定字符 class 即 </p> 更容易 <pre><code>re.split(u'\n(?=[0-9]|[^A-Za-z0-9] )', str)

正则表达式 python - 仅当换行符后跟数字或特殊字符和 space 时才匹配换行符

Regex python - Match newline only if it is followed by number or special character and space

python

regex

regex-lookarounds

positive-lookahead