交叉验证后计算平均指标时出现 KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError(f"None of [{key}] are in the [{axis_name}]") when calculating average metrics after Cross Validation
我正在尝试在执行交叉验证后计算一些平均指标。
执行此操作的函数如下:
from sklearn.model_selection import KFold
from numpy import mean
from numpy import std
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
# Returns average confusion matrix, average accuracy
# and average standard deviation after all the cross-validation runs
def get_average_metrics(model,cv,X_fss,y):
conf_matrix_list_of_arrays = []
scores = []
for train_index, test_index in cv.split(X_fss):
X_train, X_test = X_fss[train_index], X_fss[test_index]
y_train, y_test = y[train_index], y[test_index]
score = model.fit(X_train, y_train).score(X_test, y_test)
conf_matrix = confusion_matrix(y_test, model.predict(X_test))
scores.append(score)
conf_matrix_list_of_arrays.append(conf_matrix)
# Average confusion matrix
mean_of_conf_matrix_arrays = mean(conf_matrix_list_of_arrays, axis=0)
# Average accuracy
avg_score = mean(scores)
# Average standard deviation
std_score = std(scores)
return avg_score,std_score,mean_of_conf_matrix_arrays
但是,我在 X_train, X_test = X_fss[train_index], X_fss[test_index]
行中收到此错误:
KeyError: "None of [Int64Index([ 1, 2, 4, 5, 6, 7,
9, 10, 11, 12,\n ...\n 1620, 1621, 1622,
1623, 1624, 1625, 1626, 1627, 1629, 1630],\n dtype='int64',
length=1467)] are in the [columns]"
收到函数参数:
- 型号 ->
logistic = LogisticRegression()
- 简历 ->
cv = KFold(n_splits=10,shuffle=True, random_state=1)
- X_fss -> 大小为 (1631, 4)
的 Dataframe
- y -> 大小为 (1631,)
的 Series
X_fss
样本:
y
样本:
我已经解决了将 X_fss
Dataframe 转换为 numpy 数组:
X_fss = X_fss.to_numpy()
我正在尝试在执行交叉验证后计算一些平均指标。 执行此操作的函数如下:
from sklearn.model_selection import KFold
from numpy import mean
from numpy import std
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
# Returns average confusion matrix, average accuracy
# and average standard deviation after all the cross-validation runs
def get_average_metrics(model,cv,X_fss,y):
conf_matrix_list_of_arrays = []
scores = []
for train_index, test_index in cv.split(X_fss):
X_train, X_test = X_fss[train_index], X_fss[test_index]
y_train, y_test = y[train_index], y[test_index]
score = model.fit(X_train, y_train).score(X_test, y_test)
conf_matrix = confusion_matrix(y_test, model.predict(X_test))
scores.append(score)
conf_matrix_list_of_arrays.append(conf_matrix)
# Average confusion matrix
mean_of_conf_matrix_arrays = mean(conf_matrix_list_of_arrays, axis=0)
# Average accuracy
avg_score = mean(scores)
# Average standard deviation
std_score = std(scores)
return avg_score,std_score,mean_of_conf_matrix_arrays
但是,我在 X_train, X_test = X_fss[train_index], X_fss[test_index]
行中收到此错误:
KeyError: "None of [Int64Index([ 1, 2, 4, 5, 6, 7,
9, 10, 11, 12,\n ...\n 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1629, 1630],\n dtype='int64', length=1467)] are in the [columns]"
收到函数参数:
- 型号 ->
logistic = LogisticRegression()
- 简历 ->
cv = KFold(n_splits=10,shuffle=True, random_state=1)
- X_fss -> 大小为 (1631, 4) 的 Dataframe
- y -> 大小为 (1631,) 的 Series
X_fss
样本:
y
样本:
我已经解决了将 X_fss
Dataframe 转换为 numpy 数组:
X_fss = X_fss.to_numpy()