从 Python 中的句子中删除单词而不是子单词
Removing Word but not Subword from a Sentence in Python
我需要从 Python.
中的句子(给定的字符串)中删除给定的词向量
问题是我想完全删除单词而不是子字符串或子词。
注意:我不能假设这个词之前或之后有一个 space
我尝试了 .replace(word,"")
功能但不起作用
示例:s = "I'am at home and i will work by webcam call"
当我做 s.replace("am","")
输出:i' at home and i will work by webc call
也许可以帮助标记化?
您可以像这样使用列表理解:
sentence_filtered = " ".join([word for word in sentence.split() if word.lower() not in vector_of_words])
您可以使用正则表达式来 re.sub
带有单词边界 \b
的字符:
>>> import re
>>> s = "I'am at home and i will work by webcam call"
>>> re.sub(r"\bam\b", "", s)
"I' at home and i will work by webcam call"
对于单词列表,您可以使用循环,或使用 |
从几个单词构建析取,例如"am|and|i"
。可以选择使用 re.I
标志来忽略 upper/lowercase:
>>> words = ["am", "and", "i"]
>>> re.sub(r"\b(%s)\b" % "|".join(words), "", s, flags=re.I)
"' at home will work by webcam call"
我需要从 Python.
中的句子(给定的字符串)中删除给定的词向量问题是我想完全删除单词而不是子字符串或子词。
注意:我不能假设这个词之前或之后有一个 space
我尝试了 .replace(word,"")
功能但不起作用
示例:s = "I'am at home and i will work by webcam call"
当我做 s.replace("am","")
输出:i' at home and i will work by webc call
也许可以帮助标记化?
您可以像这样使用列表理解:
sentence_filtered = " ".join([word for word in sentence.split() if word.lower() not in vector_of_words])
您可以使用正则表达式来 re.sub
带有单词边界 \b
的字符:
>>> import re
>>> s = "I'am at home and i will work by webcam call"
>>> re.sub(r"\bam\b", "", s)
"I' at home and i will work by webcam call"
对于单词列表,您可以使用循环,或使用 |
从几个单词构建析取,例如"am|and|i"
。可以选择使用 re.I
标志来忽略 upper/lowercase:
>>> words = ["am", "and", "i"]
>>> re.sub(r"\b(%s)\b" % "|".join(words), "", s, flags=re.I)
"' at home will work by webcam call"