如何在 python 中制作术语文档
How to make term document in python
我从 imdb 数据集中有 16000 条这样的记录
Movie_Name Synops
Alien Predator ['great','17th', 'abigail', 'by', 'century', 'is']
Shark Exorcist ['demonic', 'devil', 'great', 'hell', 'holy', 'nun']
Jurassic Shark ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
我不知道如何像这样为 Synops 列中的每个单词制作术语文档
"great": Alien Predator,Shark Exorcist
"17th" :Alien Predator
"abigail":Alien Predator,Jurassic Shark
.....
先将它们放入字典或JSON。一旦你有了那个。
dataset = {
"Alien Predator":['great','17th', 'abigail', 'by', 'century', 'is'],
"Shark Exorcist":['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
"Jurassic Shark":['abandoned', 'an', 'and', 'beautiful', 'abigail',],
}
您可以从这里轻松查询值。
search_word = "great"
d = [movie for movie, synops in dataset.items() if search_word in synops]
回馈 ['Alien Predator', 'Shark Exorcist']
您可以将它们添加到字典中以获得完整的结果。
final_dict = {}
final_dict[search] = d
那应该给你。
>>> final_dict
{'great': ['Alien Predator', 'Shark Exorcist']}
现在您可以使用一些 for 循环和所需关键字列表来实现相同的功能,然后自己完成任务。
data = {
"Alien Predator": ['great','17th', 'abigail', 'by', 'century', 'is'],
"Shark Exorcist": ['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
"Jurassic Shark": ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
}
result = {}
for movie_name, keywords in data.items():
for keyword in keywords:
result.setdefault(keyword, []).append(movie_name)
print(result)
结果(为清楚起见添加了换行符):
{
'great': ['Alien Predator', 'Shark Exorcist'],
'17th': ['Alien Predator'],
'abigail': ['Alien Predator', 'Jurassic Shark'],
'by': ['Alien Predator'],
'century': ['Alien Predator'],
'is': ['Alien Predator'],
'demonic': ['Shark Exorcist'],
'devil': ['Shark Exorcist'],
'hell': ['Shark Exorcist'],
'holy': ['Shark Exorcist'],
'nun': ['Shark Exorcist'],
'abandoned': ['Jurassic Shark'],
'an': ['Jurassic Shark'],
'and': ['Jurassic Shark'],
'beautiful': ['Jurassic Shark']
}
我从 imdb 数据集中有 16000 条这样的记录
Movie_Name Synops
Alien Predator ['great','17th', 'abigail', 'by', 'century', 'is']
Shark Exorcist ['demonic', 'devil', 'great', 'hell', 'holy', 'nun']
Jurassic Shark ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
我不知道如何像这样为 Synops 列中的每个单词制作术语文档
"great": Alien Predator,Shark Exorcist
"17th" :Alien Predator
"abigail":Alien Predator,Jurassic Shark
.....
先将它们放入字典或JSON。一旦你有了那个。
dataset = {
"Alien Predator":['great','17th', 'abigail', 'by', 'century', 'is'],
"Shark Exorcist":['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
"Jurassic Shark":['abandoned', 'an', 'and', 'beautiful', 'abigail',],
}
您可以从这里轻松查询值。
search_word = "great"
d = [movie for movie, synops in dataset.items() if search_word in synops]
回馈 ['Alien Predator', 'Shark Exorcist']
您可以将它们添加到字典中以获得完整的结果。
final_dict = {}
final_dict[search] = d
那应该给你。
>>> final_dict
{'great': ['Alien Predator', 'Shark Exorcist']}
现在您可以使用一些 for 循环和所需关键字列表来实现相同的功能,然后自己完成任务。
data = {
"Alien Predator": ['great','17th', 'abigail', 'by', 'century', 'is'],
"Shark Exorcist": ['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
"Jurassic Shark": ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
}
result = {}
for movie_name, keywords in data.items():
for keyword in keywords:
result.setdefault(keyword, []).append(movie_name)
print(result)
结果(为清楚起见添加了换行符):
{
'great': ['Alien Predator', 'Shark Exorcist'],
'17th': ['Alien Predator'],
'abigail': ['Alien Predator', 'Jurassic Shark'],
'by': ['Alien Predator'],
'century': ['Alien Predator'],
'is': ['Alien Predator'],
'demonic': ['Shark Exorcist'],
'devil': ['Shark Exorcist'],
'hell': ['Shark Exorcist'],
'holy': ['Shark Exorcist'],
'nun': ['Shark Exorcist'],
'abandoned': ['Jurassic Shark'],
'an': ['Jurassic Shark'],
'and': ['Jurassic Shark'],
'beautiful': ['Jurassic Shark']
}