以迭代方式计算成对列表的分数时出现问题?
Problems computing the score of a pairwise list in an iterative way?
假设我有以下列表(实际上它们有很多子列表):
list_1 = [['Hi my name is anon'],
['Hi I like #hokey']]
list_2 = [['Hi my name is anon_2'],
['Hi I like #Basketball']]
我想计算所有没有重复的可能成对的 distance(没有替换的组合,乘积?)。例如:
distance between: ['Hi my name is anon'] and ['Hi my name is anon_2']
distance between: ['Hi my name is anon'] and ['Hi I like #Basketball']
distance between: ['Hi I like #hokey'] and ['Hi my name is anon_2']
distance between: ['Hi I like #hokey'] and ['Hi I like #Basketball']
然后将分数放入这样的列表中:
[distance_1,distance_2,distance_3,distance_4]
为此,我正在考虑使用 itertools product or combination。这是我试过的:
strings_1 = [i[0] for i in list_1]
strings_2 = [i[0] for i in list_2]
import itertools
scores_list = [dis.jaccard(i,j) for i,j in zip(itertools.combinations(strings_1, strings_2))]
问题是我得到了这个回溯:
scores_list = [dis.jaccard(i,j) for i,j in zip(itertools.combinations(strings_1, strings_2))]
TypeError: an integer is required
我怎样才能有效地完成这个任务,我怎样才能计算这个类似产品组合的操作?
你需要用itertools.product
来得到笛卡尔积,像这样
[dis.jaccrd(string1, string2) for string1, string2 in product(list_1, list_2)]
产品会像这样对项目进行分组
>>> from pprint import pprint
>>> pprint(list(product(list_1, list_2)))
[(['Hi my name is anon'], ['Hi my name is anon_2']),
(['Hi my name is anon'], ['Hi I like #Basketball']),
(['Hi I like #hokey'], ['Hi my name is anon_2']),
(['Hi I like #hokey'], ['Hi I like #Basketball'])]
如果您只想将 jaccrd
函数应用于列表中的字符串,那么您可能需要像这样预处理列表
>>> list_11 = [item for items in list_1 for item in items]
>>> list_21 = [item for items in list_2 for item in items]
>>> pprint([str1 + " " + str2 for str1, str2 in product(list_11, list_21)])
['Hi my name is anon Hi my name is anon_2',
'Hi my name is anon Hi I like #Basketball',
'Hi I like #hokey Hi my name is anon_2',
'Hi I like #hokey Hi I like #Basketball']
>>> pprint([dis.jaccard(str1, str2) for str1, str2 in product(list_11, list_21)])
...
...
正如 Ashwini 在评论中所建议的,对于你的情况,你可以直接使用 itertools.starmap
,像这样
>>> from itertools import product, starmap
>>> list(starmap(dis.jaccrd, product(list_11, list_21)))
例如,
>>> list_1 = ["a1", "a2", "a3"]
>>> list_2 = ["b1", "b2", "b3"]
>>> from itertools import product, starmap
>>> list(starmap(lambda x, y: x + " " + y, product(list_1, list_2)))
['a1 b1', 'a1 b2', 'a1 b3', 'a2 b1', 'a2 b2', 'a2 b3', 'a3 b1', 'a3 b2', 'a3 b3']
product
有效,但由于您只有一对,因此也有效:
[dis.jaccard(string1, string2) for string1 in list_1 for string2 in list_2]
也就是说,starmap
+ product
组合当然赢了。
假设我有以下列表(实际上它们有很多子列表):
list_1 = [['Hi my name is anon'],
['Hi I like #hokey']]
list_2 = [['Hi my name is anon_2'],
['Hi I like #Basketball']]
我想计算所有没有重复的可能成对的 distance(没有替换的组合,乘积?)。例如:
distance between: ['Hi my name is anon'] and ['Hi my name is anon_2']
distance between: ['Hi my name is anon'] and ['Hi I like #Basketball']
distance between: ['Hi I like #hokey'] and ['Hi my name is anon_2']
distance between: ['Hi I like #hokey'] and ['Hi I like #Basketball']
然后将分数放入这样的列表中:
[distance_1,distance_2,distance_3,distance_4]
为此,我正在考虑使用 itertools product or combination。这是我试过的:
strings_1 = [i[0] for i in list_1]
strings_2 = [i[0] for i in list_2]
import itertools
scores_list = [dis.jaccard(i,j) for i,j in zip(itertools.combinations(strings_1, strings_2))]
问题是我得到了这个回溯:
scores_list = [dis.jaccard(i,j) for i,j in zip(itertools.combinations(strings_1, strings_2))]
TypeError: an integer is required
我怎样才能有效地完成这个任务,我怎样才能计算这个类似产品组合的操作?
你需要用itertools.product
来得到笛卡尔积,像这样
[dis.jaccrd(string1, string2) for string1, string2 in product(list_1, list_2)]
产品会像这样对项目进行分组
>>> from pprint import pprint
>>> pprint(list(product(list_1, list_2)))
[(['Hi my name is anon'], ['Hi my name is anon_2']),
(['Hi my name is anon'], ['Hi I like #Basketball']),
(['Hi I like #hokey'], ['Hi my name is anon_2']),
(['Hi I like #hokey'], ['Hi I like #Basketball'])]
如果您只想将 jaccrd
函数应用于列表中的字符串,那么您可能需要像这样预处理列表
>>> list_11 = [item for items in list_1 for item in items]
>>> list_21 = [item for items in list_2 for item in items]
>>> pprint([str1 + " " + str2 for str1, str2 in product(list_11, list_21)])
['Hi my name is anon Hi my name is anon_2',
'Hi my name is anon Hi I like #Basketball',
'Hi I like #hokey Hi my name is anon_2',
'Hi I like #hokey Hi I like #Basketball']
>>> pprint([dis.jaccard(str1, str2) for str1, str2 in product(list_11, list_21)])
...
...
正如 Ashwini 在评论中所建议的,对于你的情况,你可以直接使用 itertools.starmap
,像这样
>>> from itertools import product, starmap
>>> list(starmap(dis.jaccrd, product(list_11, list_21)))
例如,
>>> list_1 = ["a1", "a2", "a3"]
>>> list_2 = ["b1", "b2", "b3"]
>>> from itertools import product, starmap
>>> list(starmap(lambda x, y: x + " " + y, product(list_1, list_2)))
['a1 b1', 'a1 b2', 'a1 b3', 'a2 b1', 'a2 b2', 'a2 b3', 'a3 b1', 'a3 b2', 'a3 b3']
product
有效,但由于您只有一对,因此也有效:
[dis.jaccard(string1, string2) for string1 in list_1 for string2 in list_2]
也就是说,starmap
+ product
组合当然赢了。