列出另一个列表的所有可能组合的 2 值元组
Make list of 2-value tuples of all possible combinations of another list
有这样的列表
['Jack Matthews', 'Mick LaSalle', 'Claudia Puig', 'Lisa Rose', 'Toby', 'Gene Seymour']
我如何创建一个列表,其中将存储上述列表中项目的所有可能组合,就像这样
[('Jack Matthews', 'Toby'), ('Jack Matthews', 'Claudia Puig'), ('Jack Matthews', 'Lisa Rose')] # and so on
我需要以上元组来实现这个功能
def euclidean_distance(preferences_dict, person_1, person_2):
shared_items = {}
for item in preferences_dict[person_1]:
if item in preferences_dict[person_2]:
shared_items[item] = 1
if not len(shared_items):
return
sum_of_squares = sqrt(sum([pow(preferences_dict[person_1][item] - preferences_dict[person_2][item], 2) for item in preferences_dict[person_1] if item in preferences_dict[person_2]]))
return 1/(1+sum_of_squares)
和这个数据集
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.5,
'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 1.0},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
我想计算每部电影的 2 位评论家之间的欧氏距离。
为每对评论家计算此值的最佳方法是什么,不包括重复
我考虑过这个
names = dict([(critic, critics.keys()) for critic in critics.keys()])
for critic in names.keys():
if critic in names[critic]:
names[critic].remove(critic)
actual_distance = []
for base_critic in names.keys():
for critic in names[base_critic]:
actual_distance.append(euclidean_distance(critics, base_critic, critic))
此代码的问题在于它具有重复值,因为名称['Jack Matthews'] 具有值 'Toby',反之亦然
>>> import itertools
>>> names = ['Jack Matthews', 'Mick LaSalle', 'Claudia Puig', 'Lisa Rose', 'Toby', 'Gene Seymour']
>>> combos = itertools.combinations(names, 2)
>>> for name1, name2 in combos:
... print(name1, name2)
...
('Jack Matthews', 'Mick LaSalle')
('Jack Matthews', 'Claudia Puig')
('Jack Matthews', 'Lisa Rose')
('Jack Matthews', 'Toby')
('Jack Matthews', 'Gene Seymour')
('Mick LaSalle', 'Claudia Puig')
('Mick LaSalle', 'Lisa Rose')
('Mick LaSalle', 'Toby')
('Mick LaSalle', 'Gene Seymour')
('Claudia Puig', 'Lisa Rose')
('Claudia Puig', 'Toby')
('Claudia Puig', 'Gene Seymour')
('Lisa Rose', 'Toby')
('Lisa Rose', 'Gene Seymour')
('Toby', 'Gene Seymour')
更新:现在您更新问题时情况看起来有点不同。这是一个使用 pandas
和 numpy
组合在一起的快速片段(为简单起见,我们将缺失的评分替换为零):
import numpy as np
importport pandas as pd
from itertools import combinations
df = pd.DataFrame(critics).T.fillna(0)
distances = []
for critic1, critic2 in combinations(df.index, 2):
ratings1 = df.ix[critic1].values
ratings2 = df.ix[critic2].values
dist = np.sqrt(np.sum(ratings1 - ratings2) ** 2) # euclidian distance
distances.append((dist, critic1, critic2))
pd.DataFrame(distances, columns=['distance', 'critic1', 'critic2']).sort('distance', ascending=False).head(5)
好了。 Gene Seymour 和 Toby 强烈反对他们的评级。
有这样的列表
['Jack Matthews', 'Mick LaSalle', 'Claudia Puig', 'Lisa Rose', 'Toby', 'Gene Seymour']
我如何创建一个列表,其中将存储上述列表中项目的所有可能组合,就像这样
[('Jack Matthews', 'Toby'), ('Jack Matthews', 'Claudia Puig'), ('Jack Matthews', 'Lisa Rose')] # and so on
我需要以上元组来实现这个功能
def euclidean_distance(preferences_dict, person_1, person_2):
shared_items = {}
for item in preferences_dict[person_1]:
if item in preferences_dict[person_2]:
shared_items[item] = 1
if not len(shared_items):
return
sum_of_squares = sqrt(sum([pow(preferences_dict[person_1][item] - preferences_dict[person_2][item], 2) for item in preferences_dict[person_1] if item in preferences_dict[person_2]]))
return 1/(1+sum_of_squares)
和这个数据集
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.5,
'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 1.0},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
我想计算每部电影的 2 位评论家之间的欧氏距离。 为每对评论家计算此值的最佳方法是什么,不包括重复 我考虑过这个
names = dict([(critic, critics.keys()) for critic in critics.keys()])
for critic in names.keys():
if critic in names[critic]:
names[critic].remove(critic)
actual_distance = []
for base_critic in names.keys():
for critic in names[base_critic]:
actual_distance.append(euclidean_distance(critics, base_critic, critic))
此代码的问题在于它具有重复值,因为名称['Jack Matthews'] 具有值 'Toby',反之亦然
>>> import itertools
>>> names = ['Jack Matthews', 'Mick LaSalle', 'Claudia Puig', 'Lisa Rose', 'Toby', 'Gene Seymour']
>>> combos = itertools.combinations(names, 2)
>>> for name1, name2 in combos:
... print(name1, name2)
...
('Jack Matthews', 'Mick LaSalle')
('Jack Matthews', 'Claudia Puig')
('Jack Matthews', 'Lisa Rose')
('Jack Matthews', 'Toby')
('Jack Matthews', 'Gene Seymour')
('Mick LaSalle', 'Claudia Puig')
('Mick LaSalle', 'Lisa Rose')
('Mick LaSalle', 'Toby')
('Mick LaSalle', 'Gene Seymour')
('Claudia Puig', 'Lisa Rose')
('Claudia Puig', 'Toby')
('Claudia Puig', 'Gene Seymour')
('Lisa Rose', 'Toby')
('Lisa Rose', 'Gene Seymour')
('Toby', 'Gene Seymour')
更新:现在您更新问题时情况看起来有点不同。这是一个使用 pandas
和 numpy
组合在一起的快速片段(为简单起见,我们将缺失的评分替换为零):
import numpy as np
importport pandas as pd
from itertools import combinations
df = pd.DataFrame(critics).T.fillna(0)
distances = []
for critic1, critic2 in combinations(df.index, 2):
ratings1 = df.ix[critic1].values
ratings2 = df.ix[critic2].values
dist = np.sqrt(np.sum(ratings1 - ratings2) ** 2) # euclidian distance
distances.append((dist, critic1, critic2))
pd.DataFrame(distances, columns=['distance', 'critic1', 'critic2']).sort('distance', ascending=False).head(5)
好了。 Gene Seymour 和 Toby 强烈反对他们的评级。