Python - 如何通过白色 space 将标点符号与单词分开,在标点符号和单词之间只留下一个 space?
Python - How do I separate punctuation from words by white space leaving only one space between the punctuation and the word?
我有以下字符串:
input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
除“/”、“'”、“-”、“+”和“$”外,所有标点符号都应与单词分开。
所以输出应该是:
"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"
我使用了以下代码:
for x in string.punctuation:
if x == "/":
continue
if x == "'":
continue
if x == "-":
continue
if x == "+":
continue
if x == "$":
continue
input = input.replace(x," %s " % x)
我得到以下输出:
I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10 . It's free-to-use , no $$$ involved !
可以,但问题是它有时会在标点符号和单词之间留下两个 space,例如在句子中的第一个感叹号和单词 "Do" 之间。这是因为它们之间已经有一个space。
这个问题也会发生在:input = "Hello. (hi)"。输出将是:
" Hello . ( hi ) "
注意左括号前的两个 space。
我需要的输出只有一个space在任何标点符号和单词之间,除了上面提到的5个标点符号,它们不与单词分开。我怎样才能解决这个问题?或者,是否有使用正则表达式执行此操作的更好方法?
提前致谢。
我可以这样试试吗:
>>> import string
>>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
>>> ls = []
>>> for x in input:
... if x in string.punctuation:
... ls.append(' %s' % x)
... else:
... ls.append(x)
...
>>> ''.join(ls)
"I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no $ $ $ involved !"
>>>
看起来 re
可以为你做...
>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r" ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free- to-use , no $$$ involved ! "
和
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r" ", "Hello. (hi)")
'Hello . ( hi ) '
如果尾随 space 有问题,.rtrim(theresult, ' ')
应该会为您解决:-)
由于缺乏声誉而无法发表评论,但在这种情况下
between the first exclamation mark in the sentence and the word "Do"
好像有两个space因为中间已经有一个space了!和做
!
Do
所以,如果标点后面已经有space,就不要再放space了。
另外,这里也有类似的问题:python regex inserting a space between punctuation and letters
所以可以考虑使用 re
?
在我看来否定字符class更简单:
import re
input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" ", input_string)
输出:
I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved !
# Approach 1
import re
sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r' ', sample_input)
print(re.sub(r"([^\w\/'+$\s-])([^\s])", r' ', sample_input))
# Approach 2
import string
sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
punctuation = string.punctuation.replace('/', '').replace("'", '') \
.replace('-', '').replace('+', '').replace('$', '')
i = 0
while i < len(sample_input):
if sample_input[i] not in punctuation:
i += 1
continue
if i > 0 and sample_input[i-1] != ' ':
sample_input = sample_input[:i] + ' ' + sample_input[i:]
i += 1
if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
i += 1
i += 1
print(sample_input)
我有以下字符串:
input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
除“/”、“'”、“-”、“+”和“$”外,所有标点符号都应与单词分开。
所以输出应该是:
"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"
我使用了以下代码:
for x in string.punctuation:
if x == "/":
continue
if x == "'":
continue
if x == "-":
continue
if x == "+":
continue
if x == "$":
continue
input = input.replace(x," %s " % x)
我得到以下输出:
I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10 . It's free-to-use , no $$$ involved !
可以,但问题是它有时会在标点符号和单词之间留下两个 space,例如在句子中的第一个感叹号和单词 "Do" 之间。这是因为它们之间已经有一个space。
这个问题也会发生在:input = "Hello. (hi)"。输出将是:
" Hello . ( hi ) "
注意左括号前的两个 space。
我需要的输出只有一个space在任何标点符号和单词之间,除了上面提到的5个标点符号,它们不与单词分开。我怎样才能解决这个问题?或者,是否有使用正则表达式执行此操作的更好方法?
提前致谢。
我可以这样试试吗:
>>> import string
>>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
>>> ls = []
>>> for x in input:
... if x in string.punctuation:
... ls.append(' %s' % x)
... else:
... ls.append(x)
...
>>> ''.join(ls)
"I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no $ $ $ involved !"
>>>
看起来 re
可以为你做...
>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r" ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free- to-use , no $$$ involved ! "
和
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r" ", "Hello. (hi)")
'Hello . ( hi ) '
如果尾随 space 有问题,.rtrim(theresult, ' ')
应该会为您解决:-)
由于缺乏声誉而无法发表评论,但在这种情况下
between the first exclamation mark in the sentence and the word "Do"
好像有两个space因为中间已经有一个space了!和做
! Do
所以,如果标点后面已经有space,就不要再放space了。
另外,这里也有类似的问题:python regex inserting a space between punctuation and letters
所以可以考虑使用 re
?
在我看来否定字符class更简单:
import re
input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" ", input_string)
输出:
I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved !
# Approach 1
import re
sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r' ', sample_input)
print(re.sub(r"([^\w\/'+$\s-])([^\s])", r' ', sample_input))
# Approach 2
import string
sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
punctuation = string.punctuation.replace('/', '').replace("'", '') \
.replace('-', '').replace('+', '').replace('$', '')
i = 0
while i < len(sample_input):
if sample_input[i] not in punctuation:
i += 1
continue
if i > 0 and sample_input[i-1] != ' ':
sample_input = sample_input[:i] + ' ' + sample_input[i:]
i += 1
if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
i += 1
i += 1
print(sample_input)