ValueError: setting an array element with a sequence while training KD TRee on TFIDF
ValueError: setting an array element with a sequence while training KD TRee on TFIDF
我正在尝试在文档语料库的 TF-IDF 上训练 KD 树,但它给出了
ValueError: setting an array element with a sequence.
代码和错误描述如下。有人可以帮我解决问题吗?
代码:
t0 = time.time()
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
t1 = time.time()
total = t1-t0
print "TF-IDF built:", total
#######################------------------------############################
t0 = time.time()
#nbrs = NearestNeighbors(n_neighbors=20, algorithm='kd_tree', metric='euclidean')
#nbrs.fit(X_train_tfidf)#,Y)
nbrs = KDTree(np.array(X_train_tfidf), leaf_size=100)
t1 = time.time()
total = t1-t0
print "KNN Trained:", total
#######################------------------------############################
这是错误:
TF-IDF built: 0.108999967575
Traceback (most recent call last):
File ".\tfidf_knn.py", line 48, in <module>
nbrs = KDTree(np.array(X_train_tfidf), leaf_size=100)
File "sklearn/neighbors/binary_tree.pxi", line 1055, in sklearn.neighbors.kd_tree.BinaryTree.__init__ (sklearn\neighbo
rs\kd_tree.c:8298)
File "C:\Anaconda2\lib\site-packages\numpy\core\numeric.py", line 474, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
X_train_tfidf 是一个稀疏矩阵 (scipy.sparse) and in order to be converted to a numpy array, you need to do . toarray() 。这个例子 运行 对我来说:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import time
from sklearn.neighbors import KDTree
from scipy.sparse import csr_matrix # sparse format compatible with sklearn models
from sklearn.neighbors import NearestNeighbors
import numpy as np
X=[ 'I Love dogs' ,
'you love cats',
' He loves Birds',
' she loves lizards',
' None loves me'
]
t0 = time.time()
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
t1 = time.time()
total = t1-t0
print "TF-IDF built:", total
#######################------------------------############################
t0 = time.time()
nbrs = KDTree(X_train_tfidf.toarray(), leaf_size=100)
################## for sparse input we cannot use kdtree, but we can use brute #################
#nbrs = NearestNeighbors(n_neighbors=20, algorithm='kd_tree')
#nbrs.fit(csr_matrix(X_train_tfidf))#,Y)
t1 = time.time()
total = t1-t0
print "KNN Trained:", total
印刷:
TF-IDF built: 0.00499987602234
KNN Trained: 0.029000043869
我正在尝试在文档语料库的 TF-IDF 上训练 KD 树,但它给出了
ValueError: setting an array element with a sequence.
代码和错误描述如下。有人可以帮我解决问题吗?
代码:
t0 = time.time()
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
t1 = time.time()
total = t1-t0
print "TF-IDF built:", total
#######################------------------------############################
t0 = time.time()
#nbrs = NearestNeighbors(n_neighbors=20, algorithm='kd_tree', metric='euclidean')
#nbrs.fit(X_train_tfidf)#,Y)
nbrs = KDTree(np.array(X_train_tfidf), leaf_size=100)
t1 = time.time()
total = t1-t0
print "KNN Trained:", total
#######################------------------------############################
这是错误:
TF-IDF built: 0.108999967575
Traceback (most recent call last):
File ".\tfidf_knn.py", line 48, in <module>
nbrs = KDTree(np.array(X_train_tfidf), leaf_size=100)
File "sklearn/neighbors/binary_tree.pxi", line 1055, in sklearn.neighbors.kd_tree.BinaryTree.__init__ (sklearn\neighbo
rs\kd_tree.c:8298)
File "C:\Anaconda2\lib\site-packages\numpy\core\numeric.py", line 474, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
X_train_tfidf 是一个稀疏矩阵 (scipy.sparse) and in order to be converted to a numpy array, you need to do . toarray() 。这个例子 运行 对我来说:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import time
from sklearn.neighbors import KDTree
from scipy.sparse import csr_matrix # sparse format compatible with sklearn models
from sklearn.neighbors import NearestNeighbors
import numpy as np
X=[ 'I Love dogs' ,
'you love cats',
' He loves Birds',
' she loves lizards',
' None loves me'
]
t0 = time.time()
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
t1 = time.time()
total = t1-t0
print "TF-IDF built:", total
#######################------------------------############################
t0 = time.time()
nbrs = KDTree(X_train_tfidf.toarray(), leaf_size=100)
################## for sparse input we cannot use kdtree, but we can use brute #################
#nbrs = NearestNeighbors(n_neighbors=20, algorithm='kd_tree')
#nbrs.fit(csr_matrix(X_train_tfidf))#,Y)
t1 = time.time()
total = t1-t0
print "KNN Trained:", total
印刷:
TF-IDF built: 0.00499987602234
KNN Trained: 0.029000043869