Python:在嵌套字典中制作一个字数统计的字典
Python: Make a dictionary of word count in nested dictionary
我很难制作一个函数来计算每个关键字出现的数字键。
adict =
{0: {'Fantasy': 6, 'Animation': 1, 'Family': 2, 'Action': 6, 'Comedy': 1, 'Adventure': 8},
1: {'Fantasy': 1, 'Drama': 1, 'Adventure': 9, 'Action': 10, 'Thriller': 1, 'Comedy': 1, 'Romance': 1, 'Science_Fiction': 10},
2: {'Fantasy': 8, 'Animation': 2, 'Adventure': 16, 'Thriller': 3, 'Drama': 1, 'Comedy': 1, 'Family': 4, 'Science_Fiction': 11, 'Horror': 1, 'Action': 15},
3: {'Fantasy': 1, 'Adventure': 5, 'Thriller': 4, 'Comedy': 1, 'Science_Fiction': 2, 'Crime': 3, 'Action': 6},
4: {'Animation': 2, 'Fantasy': 5, 'Adventure': 5, 'Action': 4, 'Comedy': 1, 'Family': 4, 'Romance': 1},
5: {'Fantasy': 1, 'Western': 1, 'Family': 2, 'Adventure': 4, 'Thriller': 3, 'Drama': 3, 'Science_Fiction': 1, 'Romance': 1, 'Crime': 1, 'Animation': 2, 'Action': 5}}
作为输出,我想要每个流派的字典,例如:
{'Fantasy': 6 , 'Western': 1, 'Family':4 ...}
因此,该值是包含该键(流派)的组数。
比如'Fantasy'出现在所有组中(共6个),而'Western'只出现过一次,所以是1个。
其实我想通了。
这不是那么优雅,但它仍然有效。
adict = {0: {'Fantasy': 6, 'Animation': 1, 'Family': 2, 'Action': 6, 'Comedy': 1, 'Adventure': 8}, 1: {'Fantasy': 1, 'Drama': 1, 'Adventure': 9, 'Action': 10, 'Thriller': 1, 'Comedy': 1, 'Romance': 1, 'Science_Fiction': 10}, 2: {'Fantasy': 8, 'Animation': 2, 'Adventure': 16, 'Thriller': 3, 'Drama': 1, 'Comedy': 1, 'Family': 4, 'Science_Fiction': 11, 'Horror': 1, 'Action': 15}, 3: {'Fantasy': 1, 'Adventure': 5, 'Thriller': 4, 'Comedy': 1, 'Science_Fiction': 2, 'Crime': 3, 'Action': 6}, 4: {'Animation': 2, 'Fantasy': 5, 'Adventure': 5, 'Action': 4, 'Comedy': 1, 'Family': 4, 'Romance': 1}, 5: {'Fantasy': 1, 'Western': 1, 'Family': 2, 'Adventure': 4, 'Thriller': 3, 'Drama': 3, 'Science_Fiction': 1, 'Romance': 1, 'Crime': 1, 'Animation': 2, 'Action': 5}}
def countdoc(adict):
convert = {} #to ignore the frequency of each genre and make them into lists
res = {} #result will be saved
for k,v in dic.items():
convert[k] = list(v)
for k,v in convert.items():
for i in v:
res[i] = 0
for k in res:
for x,y in convert.items():
if k in y:
res[k] += 1
return res
Output:
{'Action': 6,
'Adventure': 6,
'Animation': 4,
'Comedy': 5,
'Crime': 2,
'Drama': 3,
'Family': 4,
'Fantasy': 6,
'Horror': 1,
'Romance': 3,
'Science_Fiction': 4,
'Thriller': 4,
'Western': 1}
如果您想要更简单的解决方案,可以使用列表理解。
首先,您将字典展平为一个列表。然后计算每个唯一元素在此列表中出现的次数。
# Flatten the dictionary
genres = [genre for v in adict.values() for genre in v.keys()]
# Count each unique element and build a dictionary
occurs = {g: genres.count(g) for g in set(genres)}
# Result:
# {'Action': 6,
# 'Adventure': 6,
# 'Animation': 4,
# 'Comedy': 5,
# 'Crime': 2,
# 'Drama': 3,
# 'Family': 4,
# 'Fantasy': 6,
# 'Horror': 1,
# 'Romance': 3,
# 'Science_Fiction': 4,
# 'Thriller': 4,
# 'Western': 1}
编辑:
还有 Counter,一个来自 collections 模块的 dict 子类。
from collections import Counter
# Flatten the dictionary (same as before)
genres = [genre for v in adict.values() for genre in v.keys()]
# Create new counter from an iterable
occurs = Counter(genres)
# Result:
# Counter({'Fantasy': 6,
# 'Animation': 4,
# 'Family': 4,
# 'Action': 6,
# 'Comedy': 5,
# 'Adventure': 6,
# 'Drama': 3,
# 'Thriller': 4,
# 'Romance': 3,
# 'Science_Fiction': 4,
# 'Horror': 1,
# 'Crime': 2,
# 'Western': 1})
我很难制作一个函数来计算每个关键字出现的数字键。
adict =
{0: {'Fantasy': 6, 'Animation': 1, 'Family': 2, 'Action': 6, 'Comedy': 1, 'Adventure': 8},
1: {'Fantasy': 1, 'Drama': 1, 'Adventure': 9, 'Action': 10, 'Thriller': 1, 'Comedy': 1, 'Romance': 1, 'Science_Fiction': 10},
2: {'Fantasy': 8, 'Animation': 2, 'Adventure': 16, 'Thriller': 3, 'Drama': 1, 'Comedy': 1, 'Family': 4, 'Science_Fiction': 11, 'Horror': 1, 'Action': 15},
3: {'Fantasy': 1, 'Adventure': 5, 'Thriller': 4, 'Comedy': 1, 'Science_Fiction': 2, 'Crime': 3, 'Action': 6},
4: {'Animation': 2, 'Fantasy': 5, 'Adventure': 5, 'Action': 4, 'Comedy': 1, 'Family': 4, 'Romance': 1},
5: {'Fantasy': 1, 'Western': 1, 'Family': 2, 'Adventure': 4, 'Thriller': 3, 'Drama': 3, 'Science_Fiction': 1, 'Romance': 1, 'Crime': 1, 'Animation': 2, 'Action': 5}}
作为输出,我想要每个流派的字典,例如:
{'Fantasy': 6 , 'Western': 1, 'Family':4 ...}
因此,该值是包含该键(流派)的组数。 比如'Fantasy'出现在所有组中(共6个),而'Western'只出现过一次,所以是1个。
其实我想通了。 这不是那么优雅,但它仍然有效。
adict = {0: {'Fantasy': 6, 'Animation': 1, 'Family': 2, 'Action': 6, 'Comedy': 1, 'Adventure': 8}, 1: {'Fantasy': 1, 'Drama': 1, 'Adventure': 9, 'Action': 10, 'Thriller': 1, 'Comedy': 1, 'Romance': 1, 'Science_Fiction': 10}, 2: {'Fantasy': 8, 'Animation': 2, 'Adventure': 16, 'Thriller': 3, 'Drama': 1, 'Comedy': 1, 'Family': 4, 'Science_Fiction': 11, 'Horror': 1, 'Action': 15}, 3: {'Fantasy': 1, 'Adventure': 5, 'Thriller': 4, 'Comedy': 1, 'Science_Fiction': 2, 'Crime': 3, 'Action': 6}, 4: {'Animation': 2, 'Fantasy': 5, 'Adventure': 5, 'Action': 4, 'Comedy': 1, 'Family': 4, 'Romance': 1}, 5: {'Fantasy': 1, 'Western': 1, 'Family': 2, 'Adventure': 4, 'Thriller': 3, 'Drama': 3, 'Science_Fiction': 1, 'Romance': 1, 'Crime': 1, 'Animation': 2, 'Action': 5}}
def countdoc(adict):
convert = {} #to ignore the frequency of each genre and make them into lists
res = {} #result will be saved
for k,v in dic.items():
convert[k] = list(v)
for k,v in convert.items():
for i in v:
res[i] = 0
for k in res:
for x,y in convert.items():
if k in y:
res[k] += 1
return res
Output:
{'Action': 6,
'Adventure': 6,
'Animation': 4,
'Comedy': 5,
'Crime': 2,
'Drama': 3,
'Family': 4,
'Fantasy': 6,
'Horror': 1,
'Romance': 3,
'Science_Fiction': 4,
'Thriller': 4,
'Western': 1}
如果您想要更简单的解决方案,可以使用列表理解。
首先,您将字典展平为一个列表。然后计算每个唯一元素在此列表中出现的次数。
# Flatten the dictionary
genres = [genre for v in adict.values() for genre in v.keys()]
# Count each unique element and build a dictionary
occurs = {g: genres.count(g) for g in set(genres)}
# Result:
# {'Action': 6,
# 'Adventure': 6,
# 'Animation': 4,
# 'Comedy': 5,
# 'Crime': 2,
# 'Drama': 3,
# 'Family': 4,
# 'Fantasy': 6,
# 'Horror': 1,
# 'Romance': 3,
# 'Science_Fiction': 4,
# 'Thriller': 4,
# 'Western': 1}
编辑: 还有 Counter,一个来自 collections 模块的 dict 子类。
from collections import Counter
# Flatten the dictionary (same as before)
genres = [genre for v in adict.values() for genre in v.keys()]
# Create new counter from an iterable
occurs = Counter(genres)
# Result:
# Counter({'Fantasy': 6,
# 'Animation': 4,
# 'Family': 4,
# 'Action': 6,
# 'Comedy': 5,
# 'Adventure': 6,
# 'Drama': 3,
# 'Thriller': 4,
# 'Romance': 3,
# 'Science_Fiction': 4,
# 'Horror': 1,
# 'Crime': 2,
# 'Western': 1})