Python - 相交字典列表

Python - intersect list of dictionaries

我有这个词典列表:

artist_and_tags = [{u'Yo La Tengo': ['indie', 'indie rock', 'seen live', 'alternative', 'indie pop', 'rock', 'post-rock', 'dream pop', 'shoegaze', 'noise pop', 'folk', 'experimental', 'alternative rock', 'american', 'lo-fi', 'pop', 'new jersey', 'yo la tengo', 'usa', 'noise rock', '90s', 'noise', '00s', 'ambient', 'post-punk', '80s', 'mellow', 'psychedelic', 'hoboken', 'experimental rock', 'singer-songwriter', 'post rock', 'electronic', 'female vocalists', 'alt-country', 'dreamy', 'matador', 'chillout', 'instrumental', 'favorites', 'punk', 'electronica', 'slowcore', 'folk rock', 'new wave', 'jazz', 'eclectic', 'new york', 'emo']}, {u'Radiohead': ['alternative', 'alternative rock', 'rock', 'indie', 'electronic', 'seen live', 'british', 'britpop', 'indie rock', 'experimental', 'radiohead', 'progressive rock', '90s', 'electronica', 'art rock', 'experimental rock', 'post-rock', 'psychedelic', 'uk', 'male vocalists', 'pop', '00s', 'ambient', 'chillout', 'progressive', 'favorites', 'melancholic', 'awesome', 'overrated', 'english', 'beautiful', 'classic rock', 'genius', 'melancholy', 'better than radiohead', 'trip-hop', 'idm', 'indie pop', 'emo']}, {u'Portishead': ['trip-hop', 'electronic', 'female vocalists', 'chillout', 'trip hop', 'alternative', 'electronica', 'seen live', 'downtempo', 'british', 'indie', 'portishead', 'experimental', 'ambient', 'female vocalist', 'alternative rock', '90s', 'lounge', 'mellow', 'bristol', 'jazz', 'psychedelic', 'chill', 'melancholic', 'triphop', 'uk', 'rock', 'bristol sound', 'acid jazz', 'lo-fi']}]

我用它来了解艺术家之间的相关性。

为此,我正在做:

tags0 = set(artist_and_tags[0].values()[0])
tags1 = set(artist_and_tags[1].values()[0])
tags2 = set(artist_and_tags[2].values()[0])

然后:

intersection1 = tags0 & tags1
intersection2 = tags0 & tags2
intersection3 = tags1 & tags2

所以:

print (intersection1, len(intersection1), intersection2, len(intersection), intersection3, len(intersection3))

告诉我 "Yo La Tengo" 比 "Portishead" 更接近 "Radiohead",有 20 个相交的标签。

这段代码似乎有点多余,但是...

问题:

有没有办法在 for loop 中使用此逻辑(或包装在简单的 function 中),因此它可以与 n 艺术家的字典一起使用(keys)?

您可以使用 itertools.combinations.

import itertools
import collections

ArtistTags = collections.namedtuple('ArtistTags', ('name', 'tags'))
tags = (ArtistTags(artist, set(tags))
        for artists_dict in artist_and_tags
        for artist, tags in artists_dict.items())
artist_pairings = itertools.combinations(tags, 2)
intersections = ((len(a.tags & b.tags), a, b) for a, b in artist_pairings)
for n, a, b in sorted(intersections, reverse=True):
    print(n, a.name, b.name)

输出:

20 Yo La Tengo Radiohead
16 Yo La Tengo Portishead
16 Radiohead Portishead