ValueError: need at least one array to concatenate with sklearn cross_val_predict method
ValueError: need at least one array to concatenate with sklearn cross_val_predict method
我正在尝试使用 SVM 分类器对具有自定义交叉验证折叠的二元分类问题进行建模,但它给了我错误**需要至少一个数组来连接**与 cross_val_predict。该代码在 cros_val_predict 中与 cv=3 一起工作正常,但是当我使用 custom_cv 时,它给出了这个错误。
代码如下:
from sklearn.model_selection import LeavePOut
import numpy as np
from sklearn.svm import SVC
from time import *
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_predict,cross_val_score
clf = SVC(kernel='linear',C=25)
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8],[9,10]])
y = np.array([0,1,1,0,0])
lpo = LeavePOut(2)
print(lpo.get_n_splits(X))
LeavePOut(p=2)
test_index_list=[]
train_index_list=[]
for train_index, test_index in lpo.split(X,y):
if(y[test_index[0]]==y[test_index[1]]):
pass
else:
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
train_index_list.append(train_index)
test_index_list.append(test_index)
custom_cv = zip(train_index_list, test_index_list)
scores = cross_val_score(clf, X, y, cv=custom_cv)
print(scores)
print('accuracy:',scores.mean())
predicted=cross_val_predict(clf,X,y,cv=custom_cv) # error with this line
print('Confusion matrix:',confusion_matrix(labels, predicted))
下面是完整的错误跟踪:
ValueError Traceback (most recent call last)
<ipython-input-11-d78feac932b2> in <module>()
31 print(scores)
32 print('accuracy:',scores.mean())
---> 33 predicted=cross_val_predict(clf,X,y,cv=custom_cv)
34
35 print('Confusion matrix:',confusion_matrix(labels, predicted))
/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
758 predictions = [pred_block_i for pred_block_i, _ in prediction_blocks]
759 test_indices = np.concatenate([indices_i
--> 760 for _, indices_i in prediction_blocks])
761
762 if not _check_is_permutation(test_indices, _num_samples(X)):
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
关于如何解决这个错误有什么建议吗?
此处有 2 个错误:
- 如果您想重复使用
zip
对象,请从中创建一个列表。使用一次后,对象就会耗尽。您可以这样修复它:
custom_cv = [*zip(train_index_list, test_index_list)]
cross_val_predict
的交叉验证列表应该是实际数组 (Each sample should only belong to exactly one test set) 的分区。在你的情况下它不是。如果您考虑一下,交叉验证列表的堆叠输出将导致长度为 6 数组,而原始 y 的长度为 5。您可以实现自定义交叉 val 预测如下:
def custom_cross_val_predict(clf, X, y, cv):
y_pred, y_true = [], []
for tr_idx, vl_idx in cv:
X_tr, y_tr = X[tr_idx], y[tr_idx]
X_vl, y_vl = X[vl_idx], y[vl_idx]
clf.fit(X_tr, y_tr)
y_true.extend(y_vl)
y_pred.extend(clf.predict(X_vl))
return y_true, y_pred
labels, predicted = custom_cross_val_predict(clf,X,y,cv=custom_cv)
print('Confusion matrix:',confusion_matrix(labels, predicted))
我正在尝试使用 SVM 分类器对具有自定义交叉验证折叠的二元分类问题进行建模,但它给了我错误**需要至少一个数组来连接**与 cross_val_predict。该代码在 cros_val_predict 中与 cv=3 一起工作正常,但是当我使用 custom_cv 时,它给出了这个错误。
代码如下:
from sklearn.model_selection import LeavePOut
import numpy as np
from sklearn.svm import SVC
from time import *
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_predict,cross_val_score
clf = SVC(kernel='linear',C=25)
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8],[9,10]])
y = np.array([0,1,1,0,0])
lpo = LeavePOut(2)
print(lpo.get_n_splits(X))
LeavePOut(p=2)
test_index_list=[]
train_index_list=[]
for train_index, test_index in lpo.split(X,y):
if(y[test_index[0]]==y[test_index[1]]):
pass
else:
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
train_index_list.append(train_index)
test_index_list.append(test_index)
custom_cv = zip(train_index_list, test_index_list)
scores = cross_val_score(clf, X, y, cv=custom_cv)
print(scores)
print('accuracy:',scores.mean())
predicted=cross_val_predict(clf,X,y,cv=custom_cv) # error with this line
print('Confusion matrix:',confusion_matrix(labels, predicted))
下面是完整的错误跟踪:
ValueError Traceback (most recent call last)
<ipython-input-11-d78feac932b2> in <module>()
31 print(scores)
32 print('accuracy:',scores.mean())
---> 33 predicted=cross_val_predict(clf,X,y,cv=custom_cv)
34
35 print('Confusion matrix:',confusion_matrix(labels, predicted))
/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
758 predictions = [pred_block_i for pred_block_i, _ in prediction_blocks]
759 test_indices = np.concatenate([indices_i
--> 760 for _, indices_i in prediction_blocks])
761
762 if not _check_is_permutation(test_indices, _num_samples(X)):
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
关于如何解决这个错误有什么建议吗?
此处有 2 个错误:
- 如果您想重复使用
zip
对象,请从中创建一个列表。使用一次后,对象就会耗尽。您可以这样修复它:
custom_cv = [*zip(train_index_list, test_index_list)]
cross_val_predict
的交叉验证列表应该是实际数组 (Each sample should only belong to exactly one test set) 的分区。在你的情况下它不是。如果您考虑一下,交叉验证列表的堆叠输出将导致长度为 6 数组,而原始 y 的长度为 5。您可以实现自定义交叉 val 预测如下:
def custom_cross_val_predict(clf, X, y, cv):
y_pred, y_true = [], []
for tr_idx, vl_idx in cv:
X_tr, y_tr = X[tr_idx], y[tr_idx]
X_vl, y_vl = X[vl_idx], y[vl_idx]
clf.fit(X_tr, y_tr)
y_true.extend(y_vl)
y_pred.extend(clf.predict(X_vl))
return y_true, y_pred
labels, predicted = custom_cross_val_predict(clf,X,y,cv=custom_cv)
print('Confusion matrix:',confusion_matrix(labels, predicted))