Python:如何使用正则表达式将句子拆分为新行,然后使用空格将标点符号与单词分开?
Python: How can I use a regex to split sentences to new lines, and then separate punctuation from words using whitespace?
我有以下输入:
input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
首先,每个句子都应该换行。然后,除了“/”、“'”、“-”、“+”和“$”之外的所有标点符号都应与单词分开。
所以输出应该是:
"I love programming with Python-3 . 3 !
Do you ?
It's great . . .
I give it a 10/10 .
It's free-to-use , no $$$ involved !"
我使用了以下代码:
>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r" ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free- to-use , no $$$ involved ! "
但问题是它没有将句子分隔成新行。在标点符号和字符之间创建空格之前,如何使用正则表达式执行此操作?
类似
>>> import re
>>> from string import punctuation
>>> print re.sub(r'(?<=['+punctuation+'])\s+(?=[A-Z])', '\n', input)
I love programming with Python-3.3!
Do you?
It's great...
I give it a 10/10.
It's free-to-use, no $$$ involved!
([!?.])(?=\s*[A-Z])\s*
您可以使用此正则表达式在 regex.See demo.Replace by \n
之前创建句子。
https://regex101.com/r/sH8aR8/5
x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"([!?.])(?=\s*[A-Z])",r"\n",x)
编辑:
(?<![A-Z][a-z])([!?.])(?=\s*[A-Z])\s*
为您的不同数据集尝试 this.See 演示。
我有以下输入:
input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
首先,每个句子都应该换行。然后,除了“/”、“'”、“-”、“+”和“$”之外的所有标点符号都应与单词分开。
所以输出应该是:
"I love programming with Python-3 . 3 !
Do you ?
It's great . . .
I give it a 10/10 .
It's free-to-use , no $$$ involved !"
我使用了以下代码:
>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r" ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free- to-use , no $$$ involved ! "
但问题是它没有将句子分隔成新行。在标点符号和字符之间创建空格之前,如何使用正则表达式执行此操作?
类似
>>> import re
>>> from string import punctuation
>>> print re.sub(r'(?<=['+punctuation+'])\s+(?=[A-Z])', '\n', input)
I love programming with Python-3.3!
Do you?
It's great...
I give it a 10/10.
It's free-to-use, no $$$ involved!
([!?.])(?=\s*[A-Z])\s*
您可以使用此正则表达式在 regex.See demo.Replace by \n
之前创建句子。
https://regex101.com/r/sH8aR8/5
x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"([!?.])(?=\s*[A-Z])",r"\n",x)
编辑:
(?<![A-Z][a-z])([!?.])(?=\s*[A-Z])\s*
为您的不同数据集尝试 this.See 演示。