尝试去除每个单词的标点符号列表 - 最终删除所有标点符号

Trying to strip a list of punctuation for each word - end up removing all punctuation

我是 Python 的新手,我无法理解为什么我会遇到某些错误或事情没有按我希望的方式工作。

我想做的一件事是替换句子中所有单词的尾随给定标点符号。这就是我所拥有的:

def beautify_sentence(sentence, punctuation): """Returns a sentence that removes all the specified trailing punctuation from words.""" sentence = [words.strip(punctuation) for words in sentence] return "".join(sentence)

输入:

beautify_sentence("?hello !mango! and, ban,ana yum apple!", "?!,")

输出:

'hello mango and banana yum apple'

但我想要:

"hello mango and ban,ana yum apple"

有人可以向我解释为什么 strip() 这样做以及我做错了什么吗?

谢谢!

基本上你所做的是遍历句子中的每个字符,然后从该字符中删除标点符号 你必须遍历句子中的单词,然后从中删除标点符号这些词的开头和结尾。

使用:

def beautify_sentence(sentence, punctuation):
    """Returns a sentence that removes all the specified trailing punctuation from 
    words."""
    sentence = [word.strip(punctuation) for word in sentence.split()]
    return " ".join(sentence)

调用函数:

beautify_sentence("?hello !mango! and, ban,ana yum apple!", "?!,")

这个returns:

hello mango and ban,ana yum apple

您正在按字母拆分句子,因此在删除 leading/trailing 个字符之前,请按 space 拆分您的句子。然后 return 加入由 space 分隔的单词的句子。

NEW:
    sentence = [words.strip(punctuation) for words in sentence.split()]
    return " ".join(sentence)

你可以使用

(?:(?<!\w)[?!,])|(?:[?!,](?!\w))

参见a demo on regex101.com


Python 中,这将是:

import re
rx = re.compile(r'(?:(?<!\w)[?!,])|(?:[?!,](?!\w))')

string = rx.sub('', "?hello !mango! and, ban,ana yum apple!")
print(string)
# hello mango and ban,ana yum apple