有助于切片的字符串的特定 Python 模式
Specific Python pattern for the string that can help to slice
我正在寻找可以帮助我分割字符串的模式。字符串是这样的:
text = '1. first slice 2. second slice 3. slice number 3 4. the next one
5 that will not work but belong to no four 5. and this should be 5 and
so one...'
我想要这个:
- 第一片
- 第二个切片
- 切片编号 3
- 下一个5不行但不属于四
- 这应该是 5 等等...
希望你已经明白了。
到目前为止我研究的是我可以使用这个:
import re
parts = re.findall("\d\. \D+", text)
在遇到单个数字之前效果很好。
我知道 \D 表达式是非数字的,我尝试使用:
parts = re.findall("\d\. .+,text)
或
parts = re.findall("(\d\.).*,text)
还有很多其他的,但我找不到合适的。
我会很感激你的帮助。
您可以使用负前瞻:
parts = re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
这匹配一个数字和点,后跟 任何东西,前提是任何数字后面没有直接跟一个点。
演示:
>>> import re
>>> text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
>>> re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
['1. first slice ', '2. second slice ', '3. slice number 3 ', '4. the next one 5 that will not work but belong to no four ', '5. and this should be 5 and so one...']
在线演示 https://regex101.com/r/kF9jT1/1;为了模拟 re.findall()
行为,我添加了一个额外的 (..)
和 g
标志。
只是根据 lookahead
.
拆分
x="""1. first slice 2. second slice 3. slice number 3 4. the next one
5 that will not work but belong to no four 5. and this should be 5 and
so one..."""
print re.split(r"\s(?=\d+\.\s)",x)
输出:['1. first slice', '2. second slice', '3. slice number 3', '4. the next one\n 5 that will not work but belong to no four', '5. and this should be 5 and\n so one...']
这应该有效
( #First group to be captured
\d+\..*? #Match digit(s) followed by decimal and make it non-greedy
)
(?= #Lookahed
\d+\. #Check if what follows is digit(s) followed by decimal
| #or
$ #End of string
)
正则表达式分解
(\d+\..*?)(?=\d+\.|$)
Python代码
import re
text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
parts = re.findall(r"(\d+\..*?)(?=\d+\.|$)", text)
print(parts)
我正在寻找可以帮助我分割字符串的模式。字符串是这样的:
text = '1. first slice 2. second slice 3. slice number 3 4. the next one
5 that will not work but belong to no four 5. and this should be 5 and
so one...'
我想要这个:
- 第一片
- 第二个切片
- 切片编号 3
- 下一个5不行但不属于四
- 这应该是 5 等等...
希望你已经明白了。
到目前为止我研究的是我可以使用这个:
import re
parts = re.findall("\d\. \D+", text)
在遇到单个数字之前效果很好。 我知道 \D 表达式是非数字的,我尝试使用:
parts = re.findall("\d\. .+,text)
或
parts = re.findall("(\d\.).*,text)
还有很多其他的,但我找不到合适的。
我会很感激你的帮助。
您可以使用负前瞻:
parts = re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
这匹配一个数字和点,后跟 任何东西,前提是任何数字后面没有直接跟一个点。
演示:
>>> import re
>>> text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
>>> re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
['1. first slice ', '2. second slice ', '3. slice number 3 ', '4. the next one 5 that will not work but belong to no four ', '5. and this should be 5 and so one...']
在线演示 https://regex101.com/r/kF9jT1/1;为了模拟 re.findall()
行为,我添加了一个额外的 (..)
和 g
标志。
只是根据 lookahead
.
x="""1. first slice 2. second slice 3. slice number 3 4. the next one
5 that will not work but belong to no four 5. and this should be 5 and
so one..."""
print re.split(r"\s(?=\d+\.\s)",x)
输出:['1. first slice', '2. second slice', '3. slice number 3', '4. the next one\n 5 that will not work but belong to no four', '5. and this should be 5 and\n so one...']
这应该有效
( #First group to be captured
\d+\..*? #Match digit(s) followed by decimal and make it non-greedy
)
(?= #Lookahed
\d+\. #Check if what follows is digit(s) followed by decimal
| #or
$ #End of string
)
正则表达式分解
(\d+\..*?)(?=\d+\.|$)
Python代码
import re
text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
parts = re.findall(r"(\d+\..*?)(?=\d+\.|$)", text)
print(parts)