Python:在嵌套字典中制作一个字数统计的字典

Python: Make a dictionary of word count in nested dictionary

我很难制作一个函数来计算每个关键字出现的数字键。

adict = 
{0: {'Fantasy': 6, 'Animation': 1, 'Family': 2, 'Action': 6, 'Comedy': 1, 'Adventure': 8},

1: {'Fantasy': 1, 'Drama': 1, 'Adventure': 9, 'Action': 10, 'Thriller': 1, 'Comedy': 1, 'Romance': 1, 'Science_Fiction': 10},

2: {'Fantasy': 8, 'Animation': 2, 'Adventure': 16, 'Thriller': 3, 'Drama': 1, 'Comedy': 1, 'Family': 4, 'Science_Fiction': 11, 'Horror': 1, 'Action': 15},

3: {'Fantasy': 1, 'Adventure': 5, 'Thriller': 4, 'Comedy': 1, 'Science_Fiction': 2, 'Crime': 3, 'Action': 6}, 

4: {'Animation': 2, 'Fantasy': 5, 'Adventure': 5, 'Action': 4, 'Comedy': 1, 'Family': 4, 'Romance': 1}, 

5: {'Fantasy': 1, 'Western': 1, 'Family': 2, 'Adventure': 4, 'Thriller': 3, 'Drama': 3, 'Science_Fiction': 1, 'Romance': 1, 'Crime': 1, 'Animation': 2, 'Action': 5}}

作为输出,我想要每个流派的字典,例如:

{'Fantasy': 6 , 'Western': 1, 'Family':4 ...}

因此,该值是包含该键(流派)的组数。 比如'Fantasy'出现在所有组中(共6个),而'Western'只出现过一次,所以是1个。

其实我想通了。 这不是那么优雅,但它仍然有效。

adict = {0: {'Fantasy': 6, 'Animation': 1, 'Family': 2, 'Action': 6, 'Comedy': 1, 'Adventure': 8}, 1: {'Fantasy': 1, 'Drama': 1, 'Adventure': 9, 'Action': 10, 'Thriller': 1, 'Comedy': 1, 'Romance': 1, 'Science_Fiction': 10}, 2: {'Fantasy': 8, 'Animation': 2, 'Adventure': 16, 'Thriller': 3, 'Drama': 1, 'Comedy': 1, 'Family': 4, 'Science_Fiction': 11, 'Horror': 1, 'Action': 15}, 3: {'Fantasy': 1, 'Adventure': 5, 'Thriller': 4, 'Comedy': 1, 'Science_Fiction': 2, 'Crime': 3, 'Action': 6}, 4: {'Animation': 2, 'Fantasy': 5, 'Adventure': 5, 'Action': 4, 'Comedy': 1, 'Family': 4, 'Romance': 1}, 5: {'Fantasy': 1, 'Western': 1, 'Family': 2, 'Adventure': 4, 'Thriller': 3, 'Drama': 3, 'Science_Fiction': 1, 'Romance': 1, 'Crime': 1, 'Animation': 2, 'Action': 5}}


def countdoc(adict):
    convert = {} #to ignore the frequency of each genre and make them into lists
    res = {} #result will be saved
    for k,v in dic.items():
        convert[k] = list(v)

    for k,v in convert.items():
        for i in v:
            res[i] = 0
    for k in res:
        for x,y in convert.items():
            if k in y:
                res[k] += 1
    return res


Output: 
{'Action': 6,
 'Adventure': 6,
 'Animation': 4,
 'Comedy': 5,
 'Crime': 2,
 'Drama': 3,
 'Family': 4,
 'Fantasy': 6,
 'Horror': 1,
 'Romance': 3,
 'Science_Fiction': 4,
 'Thriller': 4,
 'Western': 1}

如果您想要更简单的解决方案,可以使用列表理解。

首先,您将字典展平为一个列表。然后计算每个唯一元素在此列表中出现的次数。

# Flatten the dictionary
genres = [genre for v in adict.values() for genre in v.keys()]

# Count each unique element and build a dictionary
occurs = {g: genres.count(g) for g in set(genres)}

# Result:
# {'Action': 6,
#  'Adventure': 6,
#  'Animation': 4,
#  'Comedy': 5,
#  'Crime': 2,
#  'Drama': 3,
#  'Family': 4,
#  'Fantasy': 6,
#  'Horror': 1,
#  'Romance': 3,
#  'Science_Fiction': 4,
#  'Thriller': 4,
#  'Western': 1}

编辑: 还有 Counter,一个来自 collections 模块的 dict 子类。

from collections import Counter

# Flatten the dictionary (same as before)
genres = [genre for v in adict.values() for genre in v.keys()]

# Create new counter from an iterable
occurs = Counter(genres)

# Result:
# Counter({'Fantasy': 6,
#          'Animation': 4,
#          'Family': 4,
#          'Action': 6,
#          'Comedy': 5,
#          'Adventure': 6,
#          'Drama': 3,
#          'Thriller': 4,
#          'Romance': 3,
#          'Science_Fiction': 4,
#          'Horror': 1,
#          'Crime': 2,
#          'Western': 1})