ValueError: need at least one array to concatenate with sklearn cross_val_predict method

Question

我正在尝试使用 SVM 分类器对具有自定义交叉验证折叠的二元分类问题进行建模，但它给了我错误**需要至少一个数组来连接**与 cross_val_predict。该代码在 cros_val_predict 中与 cv=3 一起工作正常，但是当我使用 custom_cv 时，它给出了这个错误。

代码如下：


from sklearn.model_selection import LeavePOut
import numpy as np
from sklearn.svm import SVC
from time import *
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_predict,cross_val_score
clf = SVC(kernel='linear',C=25)
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8],[9,10]])
y = np.array([0,1,1,0,0])
lpo = LeavePOut(2)
print(lpo.get_n_splits(X))
LeavePOut(p=2)
test_index_list=[]
train_index_list=[]
for train_index, test_index in lpo.split(X,y):
  
  if(y[test_index[0]]==y[test_index[1]]):
    pass
  else:
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    train_index_list.append(train_index)
    test_index_list.append(test_index)
custom_cv = zip(train_index_list, test_index_list)
scores = cross_val_score(clf, X, y, cv=custom_cv)

print(scores)
print('accuracy:',scores.mean())
predicted=cross_val_predict(clf,X,y,cv=custom_cv) # error with this line
print('Confusion matrix:',confusion_matrix(labels, predicted))

下面是完整的错误跟踪：

ValueError                                Traceback (most recent call last)
<ipython-input-11-d78feac932b2> in <module>()
     31 print(scores)
     32 print('accuracy:',scores.mean())
---> 33 predicted=cross_val_predict(clf,X,y,cv=custom_cv)
     34 
     35 print('Confusion matrix:',confusion_matrix(labels, predicted))

/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
    758     predictions = [pred_block_i for pred_block_i, _ in prediction_blocks]
    759     test_indices = np.concatenate([indices_i
--> 760                                    for _, indices_i in prediction_blocks])
    761 
    762     if not _check_is_permutation(test_indices, _num_samples(X)):

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

关于如何解决这个错误有什么建议吗？

Answer 1

此处有 2 个错误：

如果您想重复使用 zip 对象，请从中创建一个列表。使用一次后，对象就会耗尽。您可以这样修复它：

custom_cv = [*zip(train_index_list, test_index_list)]

cross_val_predict 的交叉验证列表应该是实际数组 (Each sample should only belong to exactly one test set) 的分区。在你的情况下它不是。如果您考虑一下，交叉验证列表的堆叠输出将导致长度为 6 数组，而原始 y 的长度为 5。您可以实现自定义交叉 val 预测如下：

def custom_cross_val_predict(clf, X, y, cv):
    y_pred, y_true = [], []
    for tr_idx, vl_idx in cv:
        X_tr, y_tr = X[tr_idx], y[tr_idx]
        X_vl, y_vl = X[vl_idx], y[vl_idx]
        clf.fit(X_tr, y_tr)
        y_true.extend(y_vl)
        y_pred.extend(clf.predict(X_vl))
        
    return y_true, y_pred

labels, predicted = custom_cross_val_predict(clf,X,y,cv=custom_cv)
print('Confusion matrix:',confusion_matrix(labels, predicted))

ValueError: need at least one array to concatenate with sklearn cross_val_predict method

ValueError: need at least one array to concatenate with sklearn cross_val_predict method

python

numpy

scikit-learn

cross-validation