如何以字典为参考删除句子中的单词

Question

我创建了一个字典并将其保存为文本文件。我打开为

with open(pathDoc+'/WordsDictionary.txt', 'r+', encoding="utf8") as inf:
wordsDictionary = eval(inf.read())

保存的格式是这样的：{'word1':'tag1', 'word2':'tag2'}

当给出一个句子时，我想删除属于某个标签集的单词。（只是在 stop words removal in nltk 中所做的，但这是针对 nltk 工具包不支持的语言）。示例如下。

 wordsDictionary = {'word1':'tag1', 'word2':'tag2', 'word3':'tag3'}
    Sentence = "word1 word2 word3 word2 word1"
# I want to remove words that belong to 'tag2' type
FinalSentence = "word1 word3 word1"

如何生成 FinalSentence？

谢谢！

Answer 1

您可以将键值对对调，这样单词字典就是标签字典。然后使用 tag2 作为键来获取值 word2

def reverse(words):
    return {v: k for k, v in words.items()}


tags = reverse(wordsDictionary)  #  {'tag1': 'word1', 'tag2': 'word2', 'tag3': 'word3'}

将值 word2 替换为空字符串，tags.get('tag2') 为您提供值 word2

Sentence.replace(tags.get('tag2'), '')

Answer 2

@haifzhan 的解决方案将帮助您了解每个标签一个词的用例。但是，如果每个标签需要多个单词，这里是另一种解决方案：

sentence = "word1 word2 word3 word2 word1 word4 word5 word1"
tags = {'tag1': ['word1'], 'tag2': ['word4', 'word2'], 'tag3': ['word3']} # Set a dictionary of lists based on tags

final_sentence = ' '.join([word for word in sentence.split() if word not in tags.get('tag2')])

# Output:
final_sentence
'word1 word3 word1 word5 word1'

如果你的话没有被 space 分隔，虽然你需要用不同的方式来处理这个问题，也许像这样：

for word in tags.get('tag2'):
    sentence = sentence.replace(word,'')

如何以字典为参考删除句子中的单词

How to remove words of a sentence by using a dictionary as reference

nlp

stop-words

pos-tagger

python-3.x