for循环中任何给定两个列表python中每组单词的交集/并集
Intersection / union for each set of words in any given two lists python in a for loop
我正在尝试将 Score 定义为任何给定两个列表中每组单词的交集/并集。我知道并集和交集仅适用于 set 类型的容器,我一直在努力将其设置正确但一直未能成功,有人可以帮忙吗?
corpus = [
["i","did","not","like","the","service"],
["the","service","was","ok"],
["i","was","ignored","when","i","asked","for","service"]
]
tags = ["a","b","c"]
dct_keys = {
"a":1,
"b":2,
"c":3
}
corpus_tags = dict(zip(tags,corpus))
from itertools import combinations
my_keys = list(combinations(tags, 2))
goal_dct = {}
for i in range(len(my_keys)):
goal_dct[(my_keys[i])] = {"id_alpha":(dct_keys[my_keys[i][0]]),
"id_beta" :(dct_keys[my_keys[i][1]]),
"socore" : (len(set1&set3))/(len(set1|set3))} # THIS IS WHAT I WAS TRYING TO ACHIEVE HERE
print(goal_dct)
这就是我试图定义为分数的内容,以设置示例:
set1 = {"i","did","not","like","the","service"}
set2 = {"the","service","was","ok"}
set3 = {"i","was","ignored","when","i","asked","for","service"}
(len(set1&set3))/(len(set1|set3))
根据您的清单制作组合。
set1 = set(some_list)
set2 = set(other_list)
common_items = set1.intersection(set2)
这与您认为的不同:
(len(set1)&len(set3))/(len(set1)|len(set3))
len
returns 一个 int
。您可以在整数上使用 &
和 |
运算符,但它执行按位运算,这不是您要找的。相反,您想在 集 上使用这些运算符,然后获取这些结果集的 len
:
len(set1 & set3)/len(set1 | set3)
因此,为任意两个字符串列表(句子)生成分数的函数如下所示:
def score(s1: list[str], s2: list[str]) -> float:
set1, set2 = set(s1), set(s2)
return len(set1 & set2) / len(set1 | set2)
您可以使用它为 corpus
:
中的所有组合建立分数
from itertools import combinations
from string import ascii_lowercase
corpus = [
["i","did","not","like","the","service"],
["the","service","was","ok"],
["i","was","ignored","when","i","asked","for","service"]
]
tagged_corpus = dict(zip(ascii_lowercase, corpus))
def score(s1: list[str], s2: list[str]) -> float:
set1, set2 = set(s1), set(s2)
return len(set1 & set2) / len(set1 | set2)
goal = {
(a, b): score(tagged_corpus[a], tagged_corpus[b])
for a, b in combinations(tagged_corpus, 2)
}
print(goal)
# ('a', 'b'): 0.25,
# ('a', 'c'): 0.18181818181818182,
# ('b', 'c'): 0.2222222222222222}
我正在尝试将 Score 定义为任何给定两个列表中每组单词的交集/并集。我知道并集和交集仅适用于 set 类型的容器,我一直在努力将其设置正确但一直未能成功,有人可以帮忙吗?
corpus = [
["i","did","not","like","the","service"],
["the","service","was","ok"],
["i","was","ignored","when","i","asked","for","service"]
]
tags = ["a","b","c"]
dct_keys = {
"a":1,
"b":2,
"c":3
}
corpus_tags = dict(zip(tags,corpus))
from itertools import combinations
my_keys = list(combinations(tags, 2))
goal_dct = {}
for i in range(len(my_keys)):
goal_dct[(my_keys[i])] = {"id_alpha":(dct_keys[my_keys[i][0]]),
"id_beta" :(dct_keys[my_keys[i][1]]),
"socore" : (len(set1&set3))/(len(set1|set3))} # THIS IS WHAT I WAS TRYING TO ACHIEVE HERE
print(goal_dct)
这就是我试图定义为分数的内容,以设置示例:
set1 = {"i","did","not","like","the","service"}
set2 = {"the","service","was","ok"}
set3 = {"i","was","ignored","when","i","asked","for","service"}
(len(set1&set3))/(len(set1|set3))
根据您的清单制作组合。
set1 = set(some_list)
set2 = set(other_list)
common_items = set1.intersection(set2)
这与您认为的不同:
(len(set1)&len(set3))/(len(set1)|len(set3))
len
returns 一个 int
。您可以在整数上使用 &
和 |
运算符,但它执行按位运算,这不是您要找的。相反,您想在 集 上使用这些运算符,然后获取这些结果集的 len
:
len(set1 & set3)/len(set1 | set3)
因此,为任意两个字符串列表(句子)生成分数的函数如下所示:
def score(s1: list[str], s2: list[str]) -> float:
set1, set2 = set(s1), set(s2)
return len(set1 & set2) / len(set1 | set2)
您可以使用它为 corpus
:
from itertools import combinations
from string import ascii_lowercase
corpus = [
["i","did","not","like","the","service"],
["the","service","was","ok"],
["i","was","ignored","when","i","asked","for","service"]
]
tagged_corpus = dict(zip(ascii_lowercase, corpus))
def score(s1: list[str], s2: list[str]) -> float:
set1, set2 = set(s1), set(s2)
return len(set1 & set2) / len(set1 | set2)
goal = {
(a, b): score(tagged_corpus[a], tagged_corpus[b])
for a, b in combinations(tagged_corpus, 2)
}
print(goal)
# ('a', 'b'): 0.25,
# ('a', 'c'): 0.18181818181818182,
# ('b', 'c'): 0.2222222222222222}