从 Python 中的句子中删除单词而不是子单词

Question

我需要从 Python.

中的句子（给定的字符串）中删除给定的词向量

问题是我想完全删除单词而不是子字符串或子词。

注意：我不能假设这个词之前或之后有一个 space

我尝试了 .replace(word,"") 功能但不起作用

示例：s = "I'am at home and i will work by webcam call"

当我做 s.replace("am","")

输出：i' at home and i will work by webc call

也许可以帮助标记化？

Answer 1

您可以像这样使用列表理解：

sentence_filtered = " ".join([word for word in sentence.split() if word.lower() not in vector_of_words])

Answer 2

您可以使用正则表达式来 re.sub 带有单词边界 \b 的字符：

>>> import re
>>> s = "I'am at home and i will work by webcam call"
>>> re.sub(r"\bam\b", "", s)
"I' at home and i will work by webcam call"

对于单词列表，您可以使用循环，或使用 | 从几个单词构建析取，例如"am|and|i"。可以选择使用 re.I 标志来忽略 upper/lowercase:

>>> words = ["am", "and", "i"]
>>> re.sub(r"\b(%s)\b" % "|".join(words), "", s, flags=re.I)
"' at home   will work by webcam call"

Removing Word but not Subword from a Sentence in Python