如何解释 python 中的余弦相似度输出

Question

初学者 @ Python 这里。我有一个 pandas DataFrame df 列：userID, weight, SEI, 名称.

#libraries 
   import numpy as np; import pandas as pd
   from sklearn.metrics.pairwise import cosine_similarity
    
#dataframe
   userID    weight     SEI        name
   3         125.0.     0.562140   263
   4         254.0.     0.377294   869 
   5         451.0.     0.872896   196
   1429      451.0.     0.872896   196 
   5         129.0.     0.569432   582
   ...       ...        ...        ...

#output
   cosine_similarity(df)

   array([[1.        , 0.98731894, 0.75370844, ..., 0.33814175, 0.33700687, 0.24443919],
   [0.98731894, 1.        , 0.63987877, ..., 0.35037059, 0.34963404, 0.23870279],
   [0.75370844, 0.63987877, 1.        , ..., 0.16648431, 0.16403693, 0.17438159], 
   ...,

userID 3 的人 weight 为 125.0，SEI 为 0.562140 . 姓名 263 的人的体重也为 125.0，SEI 为 0.562140。（我不得不为 name 列使用标签编码器，因为我无法在不更改列数据类型的情况下运行余弦相似度函数。希望这不会'影响最终目标？)

目标是在所有行上使用余弦相似度将 userID 列中的值与 name 列中的值匹配。为了做到这一点，我只需要一些解释输出的指导。我只知道余弦值越大相似度越大

感谢任何帮助！

Answer 1

让自己更轻松，按两列分组

result1=df.sort_values('weight')
result2=(result1.groupby(['userID_x','SEI']).apply(lambda g: 
         cosine_similarity(g['weight'].values.reshape(1, -1), 
         g['artist'].values.reshape(1,-1))[0][0])).rename('CosSim').reset_index()

如何解释 python 中的余弦相似度输出

How to interpret cosine similarity output in python

python

dataframe

pandas

cosine-similarity