有助于切片的字符串的特定 Python 模式

Specific Python pattern for the string that can help to slice

我正在寻找可以帮助我分割字符串的模式。字符串是这样的:

text = '1. first slice 2. second slice 3. slice number 3 4. the next one
 5 that will not work but belong to no four 5. and this should be 5 and
 so one...'

我想要这个:

  1. 第一片
  2. 第二个切片
  3. 切片编号 3
  4. 下一个5不行但不属于四
  5. 这应该是 5 等等...

希望你已经明白了。

到目前为止我研究的是我可以使用这个:

import re

parts = re.findall("\d\. \D+", text)

在遇到单个数字之前效果很好。 我知道 \D 表达式是非数字的,我尝试使用:

parts = re.findall("\d\. .+,text)

parts = re.findall("(\d\.).*,text)

还有很多其他的,但我找不到合适的。

我会很感激你的帮助。

您可以使用负前瞻:

parts = re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)

这匹配一个数字和点,后跟 任何东西,前提是任何数字后面没有直接跟一个点。

演示:

>>> import re
>>> text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
>>> re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
['1. first slice ', '2. second slice ', '3. slice number 3 ', '4. the next one 5 that will not work but belong to no four ', '5. and this should be 5 and so one...']

在线演示 https://regex101.com/r/kF9jT1/1;为了模拟 re.findall() 行为,我添加了一个额外的 (..)g 标志。

只是根据 lookahead.

拆分
 x="""1. first slice 2. second slice 3. slice number 3 4. the next one
5 that will not work but belong to no four 5. and this should be 5 and
so one..."""
print re.split(r"\s(?=\d+\.\s)",x)

输出:['1. first slice', '2. second slice', '3. slice number 3', '4. the next one\n 5 that will not work but belong to no four', '5. and this should be 5 and\n so one...']

这应该有效

( #First group to be captured
   \d+\..*? #Match digit(s) followed by decimal and make it non-greedy
)
(?=  #Lookahed
   \d+\. #Check if what follows is digit(s) followed by decimal
   | #or
   $ #End of string
)

Regex Demo

正则表达式分解

(\d+\..*?)(?=\d+\.|$)

Python代码

import re
text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
parts = re.findall(r"(\d+\..*?)(?=\d+\.|$)", text)
print(parts)

Ideone Demo