如何以字典为参考删除句子中的单词
How to remove words of a sentence by using a dictionary as reference
我创建了一个字典并将其保存为文本文件。我打开为
with open(pathDoc+'/WordsDictionary.txt', 'r+', encoding="utf8") as inf:
wordsDictionary = eval(inf.read())
保存的格式是这样的:{'word1':'tag1', 'word2':'tag2'}
当给出一个句子时,我想删除属于某个标签集的单词。 (只是在 stop words removal in nltk
中所做的,但这是针对 nltk 工具包不支持的语言)。示例如下。
wordsDictionary = {'word1':'tag1', 'word2':'tag2', 'word3':'tag3'}
Sentence = "word1 word2 word3 word2 word1"
# I want to remove words that belong to 'tag2' type
FinalSentence = "word1 word3 word1"
如何生成 FinalSentence
?
谢谢!
您可以将键值对对调,这样单词字典就是标签字典。然后使用 tag2
作为键来获取值 word2
def reverse(words):
return {v: k for k, v in words.items()}
tags = reverse(wordsDictionary) # {'tag1': 'word1', 'tag2': 'word2', 'tag3': 'word3'}
将值 word2
替换为空字符串,tags.get('tag2') 为您提供值 word2
Sentence.replace(tags.get('tag2'), '')
@haifzhan 的解决方案将帮助您了解每个标签一个词的用例。但是,如果每个标签需要多个单词,这里是另一种解决方案:
sentence = "word1 word2 word3 word2 word1 word4 word5 word1"
tags = {'tag1': ['word1'], 'tag2': ['word4', 'word2'], 'tag3': ['word3']} # Set a dictionary of lists based on tags
final_sentence = ' '.join([word for word in sentence.split() if word not in tags.get('tag2')])
# Output:
final_sentence
'word1 word3 word1 word5 word1'
如果你的话没有被 space 分隔,虽然你需要用不同的方式来处理这个问题,也许像这样:
for word in tags.get('tag2'):
sentence = sentence.replace(word,'')
我创建了一个字典并将其保存为文本文件。我打开为
with open(pathDoc+'/WordsDictionary.txt', 'r+', encoding="utf8") as inf:
wordsDictionary = eval(inf.read())
保存的格式是这样的:{'word1':'tag1', 'word2':'tag2'}
当给出一个句子时,我想删除属于某个标签集的单词。 (只是在 stop words removal in nltk
中所做的,但这是针对 nltk 工具包不支持的语言)。示例如下。
wordsDictionary = {'word1':'tag1', 'word2':'tag2', 'word3':'tag3'}
Sentence = "word1 word2 word3 word2 word1"
# I want to remove words that belong to 'tag2' type
FinalSentence = "word1 word3 word1"
如何生成 FinalSentence
?
谢谢!
您可以将键值对对调,这样单词字典就是标签字典。然后使用 tag2
作为键来获取值 word2
def reverse(words):
return {v: k for k, v in words.items()}
tags = reverse(wordsDictionary) # {'tag1': 'word1', 'tag2': 'word2', 'tag3': 'word3'}
将值 word2
替换为空字符串,tags.get('tag2') 为您提供值 word2
Sentence.replace(tags.get('tag2'), '')
@haifzhan 的解决方案将帮助您了解每个标签一个词的用例。但是,如果每个标签需要多个单词,这里是另一种解决方案:
sentence = "word1 word2 word3 word2 word1 word4 word5 word1"
tags = {'tag1': ['word1'], 'tag2': ['word4', 'word2'], 'tag3': ['word3']} # Set a dictionary of lists based on tags
final_sentence = ' '.join([word for word in sentence.split() if word not in tags.get('tag2')])
# Output:
final_sentence
'word1 word3 word1 word5 word1'
如果你的话没有被 space 分隔,虽然你需要用不同的方式来处理这个问题,也许像这样:
for word in tags.get('tag2'):
sentence = sentence.replace(word,'')