for循环中任何给定两个列表python中每组单词的交集/并集

Question

我正在尝试将 Score 定义为任何给定两个列表中每组单词的交集/并集。我知道并集和交集仅适用于 set 类型的容器，我一直在努力将其设置正确但一直未能成功，有人可以帮忙吗？

corpus = [
    ["i","did","not","like","the","service"],
    ["the","service","was","ok"],
    ["i","was","ignored","when","i","asked","for","service"]
]
tags = ["a","b","c"]
dct_keys = {
    "a":1,
    "b":2,
    "c":3
}
corpus_tags = dict(zip(tags,corpus))

from itertools import combinations
my_keys = list(combinations(tags, 2))

goal_dct = {}
for i in range(len(my_keys)):
    goal_dct[(my_keys[i])] = {"id_alpha":(dct_keys[my_keys[i][0]]),
                             "id_beta"  :(dct_keys[my_keys[i][1]]),
                             "socore" : (len(set1&set3))/(len(set1|set3))} # THIS IS WHAT I WAS TRYING TO ACHIEVE HERE
print(goal_dct)

这就是我试图定义为分数的内容，以设置示例：

set1 = {"i","did","not","like","the","service"}
set2 = {"the","service","was","ok"}
set3 = {"i","was","ignored","when","i","asked","for","service"}
(len(set1&set3))/(len(set1|set3))

Answer 1

根据您的清单制作组合。

set1 = set(some_list)
set2 = set(other_list)
common_items = set1.intersection(set2)

Answer 2

这与您认为的不同：

(len(set1)&len(set3))/(len(set1)|len(set3))

len returns 一个 int。您可以在整数上使用 & 和 | 运算符，但它执行按位运算，这不是您要找的。相反，您想在集上使用这些运算符，然后获取这些结果集的 len：

len(set1 & set3)/len(set1 | set3)

因此，为任意两个字符串列表（句子）生成分数的函数如下所示：

def score(s1: list[str], s2: list[str]) -> float:
    set1, set2 = set(s1), set(s2)
    return len(set1 & set2) / len(set1 | set2)

您可以使用它为 corpus:

中的所有组合建立分数

from itertools import combinations
from string import ascii_lowercase

corpus = [
    ["i","did","not","like","the","service"],
    ["the","service","was","ok"],
    ["i","was","ignored","when","i","asked","for","service"]
]
tagged_corpus = dict(zip(ascii_lowercase, corpus))

def score(s1: list[str], s2: list[str]) -> float:
    set1, set2 = set(s1), set(s2)
    return len(set1 & set2) / len(set1 | set2)

goal = {
    (a, b): score(tagged_corpus[a], tagged_corpus[b])
    for a, b in combinations(tagged_corpus, 2)
}

print(goal)  
# ('a', 'b'): 0.25, 
# ('a', 'c'): 0.18181818181818182, 
# ('b', 'c'): 0.2222222222222222}

for循环中任何给定两个列表python中每组单词的交集/并集

Intersection / union for each set of words in any given two lists python in a for loop

python

for-loop

set