scikit learn 类型错误 only integer arrays with one element can be converted to an index

Question

我在调用 cosine_similarity 时收到以下错误

numerator = sum(a*b for a,b in zip(x,y))
TypeError: only integer arrays with one element can be converted to an index

我正在尝试从 CountVectorizer 返回的文档关键字矩阵中获取关键字关键字共现矩阵。

我觉得 cosine_similarity 不喜欢我传递的数据类型，但我不确定问题到底是什么。这里，n 是 scipy.sparse.csc.csc_matrix 类型，y 是 scipy.sparse.csr.csr_matrix

类型

documents = (
    "The sky is blue",
    "The sun is bright",
    "The sun in the sky is bright",
    "We can see the shining sun, the bright sun"
)

countvectorizer = CountVectorizer()
y =  countvectorizer.fit_transform(documents)
n  = y.T.dot(y) 
x = n.tocsr()
x = x.toarray()
numpy.fill_diagonal(x, 0) 

result = cosine_similarity(x, "None")

Answer 1

使用 sklearn cosine_similarity 此代码段运行并且 returns 一个合理的答案。

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import distance_metrics

documents = (
    "The sky is blue",
    "The sun is bright",
    "The sun in the sky is bright",
    "We can see the shining sun, the bright sun"
)

countvectorizer = CountVectorizer()
y =  countvectorizer.fit_transform(documents)
n  = y.T.dot(y) 
x = n.tocsr()
x = x.toarray()
np.fill_diagonal(x, 0) 
cosine_similarity = distance_metrics()['cosine']
result = cosine_similarity(x, x)

scikit learn 类型错误 only integer arrays with one element can be converted to an index

scikit learn Type error only integer arrays with one element can be converted to an index

python

cosine-similarity

scikit-learn