交叉验证后计算平均指标时出现 KeyError(f"None of [{key}] are in the [{axis_name}]")

Question

我正在尝试在执行交叉验证后计算一些平均指标。执行此操作的函数如下：

from sklearn.model_selection import KFold
from numpy import mean
from numpy import std
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression

    # Returns average confusion matrix, average accuracy 
    # and average standard deviation after all the cross-validation runs
    def get_average_metrics(model,cv,X_fss,y):
        conf_matrix_list_of_arrays = []
        scores = []
        for train_index, test_index in cv.split(X_fss):
           X_train, X_test = X_fss[train_index], X_fss[test_index]
           y_train, y_test = y[train_index], y[test_index]
           score = model.fit(X_train, y_train).score(X_test, y_test)
           conf_matrix = confusion_matrix(y_test, model.predict(X_test))
           scores.append(score)
           conf_matrix_list_of_arrays.append(conf_matrix)
        # Average confusion matrix
        mean_of_conf_matrix_arrays = mean(conf_matrix_list_of_arrays, axis=0)
        # Average accuracy
        avg_score = mean(scores)
        # Average standard deviation
        std_score = std(scores)
        return avg_score,std_score,mean_of_conf_matrix_arrays

但是，我在 X_train, X_test = X_fss[train_index], X_fss[test_index] 行中收到此错误：

KeyError: "None of [Int64Index([ 1, 2, 4, 5, 6, 7,
9, 10, 11, 12,\n ...\n 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1629, 1630],\n dtype='int64', length=1467)] are in the [columns]"

收到函数参数:

型号 -> logistic = LogisticRegression()
简历 -> cv = KFold(n_splits=10,shuffle=True, random_state=1)
X_fss -> 大小为 (1631, 4)

Dataframe

y -> 大小为 (1631,)

Series

X_fss样本：

y样本：

Answer 1

我已经解决了将 X_fss Dataframe 转换为 numpy 数组:

X_fss = X_fss.to_numpy()

交叉验证后计算平均指标时出现 KeyError(f"None of [{key}] are in the [{axis_name}]")

KeyError(f"None of [{key}] are in the [{axis_name}]") when calculating average metrics after Cross Validation

python

pandas

scikit-learn

sklearn-pandas