用余弦相似度计算未评级的项目
Calculate unrated items with cosine similarity
我想用这个方法计算余弦相似度的未评分项目。
import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
dff = pd.DataFrame(np.random.randint(0, 10, (5, 3)))
temp = dff.copy()
dff
0 1 2
0 8 0 4
1 6 9 4
2 5 0 5
3 5 9 4
4 9 4 8
cossim = cosine_similarity(dff) # calculate scores.
cossim
array([[1. , 0.62, 0.95, 0.57, 0.92],
[0.62, 1. , 0.61, 1. , 0.83],
[0.95, 0.61, 1. , 0.58, 0.95],
[0.57, 1. , 0.58, 1. , 0.81],
[0.92, 0.83, 0.95, 0.81, 1. ]])
我想用余弦相似度得分计算 0 个值。
for x in range(0,dff.shape[1]):
indexes = dff.index[dff.loc[:,x]==0].tolist()
for y in indexes:
dff.loc[y,x] = (cossim[y]*temp.loc[:,x].to_numpy()).sum()
dff
0 1 2
0 8 21.307945 4
1 6 9.000000 4
2 5 34.528532 5
3 5 9.000000 4
4 9 4.000000 8
我是用两个for循环计算的?
有没有pythonic的方法来计算它?
有测试数据(真实值)
import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
data = [['aa',1, 10], ['aa',2,6], ['aa',3, 5],['aa',4],
['bb',1], ['bb',2,5], ['bb',3,8],['bb',4,6],
['cc',7], ['cc',2], ['cc',3,4],['cc',4,9]]
df = pd.DataFrame(data, columns = ['full_name','user_id', 'rating'])
repo_matrix = df.pivot_table(index='full_name', columns='user_id', values='rating')
repo_matrix.replace(np.nan, 0, inplace=True)
repo_matrix
cossim = cosine_similarity(repo_matrix)
display(cossim)
temp = repo_matrix.copy()
repo_matrix = temp.mask(temp==0, cossim@temp)
repo_matrix
所有零都转换为 NaN。 ??
你的操作只是矩阵乘法。所以你可以这样做:
# pass the numpy array instead of dataframe
# also, you don't need to copy to temp
dff = dff.mask(dff==0, cossim @ dff.values)
输出:
0 1 2
0 8 14.35119 4
1 6 9.00000 4
2 5 14.49324 5
3 5 9.00000 4
4 9 4.00000 8
我想用这个方法计算余弦相似度的未评分项目。
import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
dff = pd.DataFrame(np.random.randint(0, 10, (5, 3)))
temp = dff.copy()
dff
0 1 2
0 8 0 4
1 6 9 4
2 5 0 5
3 5 9 4
4 9 4 8
cossim = cosine_similarity(dff) # calculate scores.
cossim
array([[1. , 0.62, 0.95, 0.57, 0.92],
[0.62, 1. , 0.61, 1. , 0.83],
[0.95, 0.61, 1. , 0.58, 0.95],
[0.57, 1. , 0.58, 1. , 0.81],
[0.92, 0.83, 0.95, 0.81, 1. ]])
我想用余弦相似度得分计算 0 个值。
for x in range(0,dff.shape[1]):
indexes = dff.index[dff.loc[:,x]==0].tolist()
for y in indexes:
dff.loc[y,x] = (cossim[y]*temp.loc[:,x].to_numpy()).sum()
dff
0 1 2
0 8 21.307945 4
1 6 9.000000 4
2 5 34.528532 5
3 5 9.000000 4
4 9 4.000000 8
我是用两个for循环计算的? 有没有pythonic的方法来计算它?
有测试数据(真实值)
import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
data = [['aa',1, 10], ['aa',2,6], ['aa',3, 5],['aa',4],
['bb',1], ['bb',2,5], ['bb',3,8],['bb',4,6],
['cc',7], ['cc',2], ['cc',3,4],['cc',4,9]]
df = pd.DataFrame(data, columns = ['full_name','user_id', 'rating'])
repo_matrix = df.pivot_table(index='full_name', columns='user_id', values='rating')
repo_matrix.replace(np.nan, 0, inplace=True)
repo_matrix
cossim = cosine_similarity(repo_matrix)
display(cossim)
temp = repo_matrix.copy()
repo_matrix = temp.mask(temp==0, cossim@temp)
repo_matrix
所有零都转换为 NaN。 ??
你的操作只是矩阵乘法。所以你可以这样做:
# pass the numpy array instead of dataframe
# also, you don't need to copy to temp
dff = dff.mask(dff==0, cossim @ dff.values)
输出:
0 1 2
0 8 14.35119 4
1 6 9.00000 4
2 5 14.49324 5
3 5 9.00000 4
4 9 4.00000 8