scikit learn 类型错误 only integer arrays with one element can be converted to an index
scikit learn Type error only integer arrays with one element can be converted to an index
我在调用 cosine_similarity 时收到以下错误
numerator = sum(a*b for a,b in zip(x,y))
TypeError: only integer arrays with one element can be converted to an index
我正在尝试从 CountVectorizer 返回的文档关键字矩阵中获取关键字关键字共现矩阵。
我觉得 cosine_similarity
不喜欢我传递的数据类型,但我不确定问题到底是什么。这里,n
是 scipy.sparse.csc.csc_matrix
类型,y
是 scipy.sparse.csr.csr_matrix
类型
documents = (
"The sky is blue",
"The sun is bright",
"The sun in the sky is bright",
"We can see the shining sun, the bright sun"
)
countvectorizer = CountVectorizer()
y = countvectorizer.fit_transform(documents)
n = y.T.dot(y)
x = n.tocsr()
x = x.toarray()
numpy.fill_diagonal(x, 0)
result = cosine_similarity(x, "None")
使用 sklearn
cosine_similarity
此代码段运行并且 returns 一个合理的答案。
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import distance_metrics
documents = (
"The sky is blue",
"The sun is bright",
"The sun in the sky is bright",
"We can see the shining sun, the bright sun"
)
countvectorizer = CountVectorizer()
y = countvectorizer.fit_transform(documents)
n = y.T.dot(y)
x = n.tocsr()
x = x.toarray()
np.fill_diagonal(x, 0)
cosine_similarity = distance_metrics()['cosine']
result = cosine_similarity(x, x)
我在调用 cosine_similarity 时收到以下错误
numerator = sum(a*b for a,b in zip(x,y))
TypeError: only integer arrays with one element can be converted to an index
我正在尝试从 CountVectorizer 返回的文档关键字矩阵中获取关键字关键字共现矩阵。
我觉得 cosine_similarity
不喜欢我传递的数据类型,但我不确定问题到底是什么。这里,n
是 scipy.sparse.csc.csc_matrix
类型,y
是 scipy.sparse.csr.csr_matrix
documents = (
"The sky is blue",
"The sun is bright",
"The sun in the sky is bright",
"We can see the shining sun, the bright sun"
)
countvectorizer = CountVectorizer()
y = countvectorizer.fit_transform(documents)
n = y.T.dot(y)
x = n.tocsr()
x = x.toarray()
numpy.fill_diagonal(x, 0)
result = cosine_similarity(x, "None")
使用 sklearn
cosine_similarity
此代码段运行并且 returns 一个合理的答案。
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import distance_metrics
documents = (
"The sky is blue",
"The sun is bright",
"The sun in the sky is bright",
"We can see the shining sun, the bright sun"
)
countvectorizer = CountVectorizer()
y = countvectorizer.fit_transform(documents)
n = y.T.dot(y)
x = n.tocsr()
x = x.toarray()
np.fill_diagonal(x, 0)
cosine_similarity = distance_metrics()['cosine']
result = cosine_similarity(x, x)