有助于切片的字符串的特定 Python 模式

Question

我正在寻找可以帮助我分割字符串的模式。字符串是这样的：

text = '1. first slice 2. second slice 3. slice number 3 4. the next one
 5 that will not work but belong to no four 5. and this should be 5 and
 so one...'

我想要这个：

第一片
第二个切片
切片编号 3
下一个5不行但不属于四
这应该是 5 等等...

希望你已经明白了。

到目前为止我研究的是我可以使用这个：

import re

parts = re.findall("\d\. \D+", text)

在遇到单个数字之前效果很好。我知道 \D 表达式是非数字的，我尝试使用：

parts = re.findall("\d\. .+,text)

或

parts = re.findall("(\d\.).*,text)

还有很多其他的，但我找不到合适的。

我会很感激你的帮助。

Answer 1

您可以使用负前瞻：

parts = re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)

这匹配一个数字和点，后跟 任何东西，前提是任何数字后面没有直接跟一个点。

演示：

>>> import re
>>> text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
>>> re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
['1. first slice ', '2. second slice ', '3. slice number 3 ', '4. the next one 5 that will not work but belong to no four ', '5. and this should be 5 and so one...']

在线演示 https://regex101.com/r/kF9jT1/1；为了模拟 re.findall() 行为，我添加了一个额外的 (..) 和 g 标志。

Answer 2

只是根据 lookahead.

拆分

 x="""1. first slice 2. second slice 3. slice number 3 4. the next one
5 that will not work but belong to no four 5. and this should be 5 and
so one..."""
print re.split(r"\s(?=\d+\.\s)",x)

输出：['1. first slice', '2. second slice', '3. slice number 3', '4. the next one\n 5 that will not work but belong to no four', '5. and this should be 5 and\n so one...']

Answer 3

这应该有效

( #First group to be captured
   \d+\..*? #Match digit(s) followed by decimal and make it non-greedy
)
(?=  #Lookahed
   \d+\. #Check if what follows is digit(s) followed by decimal
   | #or
   $ #End of string
)

Regex Demo

正则表达式分解

(\d+\..*?)(?=\d+\.|$)

Python代码

import re
text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
parts = re.findall(r"(\d+\..*?)(?=\d+\.|$)", text)
print(parts)

Ideone Demo

有助于切片的字符串的特定 Python 模式

Specific Python pattern for the string that can help to slice

python

regex

findall