从 Python 中的句子中删除单词而不是子单词

Removing Word but not Subword from a Sentence in Python

我需要从 Python.

中的句子(给定的字符串)中删除给定的词向量

问题是我想完全删除单词而不是子字符串或子词。

注意:我不能假设这个词之前或之后有一个 space

我尝试了 .replace(word,"") 功能但不起作用

示例:s = "I'am at home and i will work by webcam call"

当我做 s.replace("am","")

输出:i' at home and i will work by webc call

也许可以帮助标记化?

您可以像这样使用列表理解:

sentence_filtered = " ".join([word for word in sentence.split() if word.lower() not in vector_of_words])

您可以使用正则表达式来 re.sub 带有单词边界 \b 的字符:

>>> import re
>>> s = "I'am at home and i will work by webcam call"
>>> re.sub(r"\bam\b", "", s)
"I' at home and i will work by webcam call"

对于单词列表,您可以使用循环,或使用 | 从几个单词构建析取,例如"am|and|i"。可以选择使用 re.I 标志来忽略 upper/lowercase:

>>> words = ["am", "and", "i"]
>>> re.sub(r"\b(%s)\b" % "|".join(words), "", s, flags=re.I)
"' at home   will work by webcam call"