kNN 指标中的 ValueError

ValueError in kNN metrics

我有一个项目,包括在 csv 文件中使用 kNN 算法并显示选定的指标。但是当我尝试呈现一些指标时,它会抛出一些错误。

当尝试使用:灵敏度f1_Score精度时:

  1. 灵敏度 - 打印(metrics.recall_score(y_test, y_pred_class))
  2. F1_score - 打印(指标。f1_score(y_test, y_pred_class))
  3. Presicion - 打印(metrics.precision_score(y_test, y_pred_class))

Pycharm 抛出以下错误:

ValueError: Target is multiclass but average='binary'. Please choose another average setting

尝试打印 ROC 曲线时出现的错误有点不同:

ValueError: multiclass format is not supported


数据集

LINK 到数据集:https://www.dropbox.com/s/yt3n1eqxlsb816n/Testfile%20-%20kNN.csv?dl=0

计划

import matplotlib
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib.dviread import Text

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

#Tools para teste
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

def main():
    dataset = pd.read_csv('filetestKNN.csv')

    X = dataset.drop(columns=['Label'])
    y = dataset['Label'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y,     random_state=0, test_size=0.34)

    Classifier = KNeighborsClassifier(n_neighbors=2, p=2, metric='euclidean')
    Classifier.fit(X_train, y_train)

    y_pred_class = Classifier.predict(X_test)
    y_pred_prob = Classifier.predict_proba(X_test)[:, 1]

    accuracy = Classifier.score(X_test, y_test)

    confusion = metrics.confusion_matrix(y_test, y_pred_class)

    print()
    print("Accuracy")
    print(metrics.accuracy_score(y_test, y_pred_class))
    print()
    print("Classification Error")
    print(1 - metrics.accuracy_score(y_test, y_pred_class))
    print()
    print("Confusion matrix")
    print(metrics.confusion_matrix(y_test, y_pred_class))
    #error
    print(metrics.recall_score(y_test, y_pred_class))
    #error
    print(metrics.roc_curve(y_test, y_pred_class))
    #error
    print(metrics.f1_score(y_test, y_pred_class))
    #error
    print(metrics.precision_score(y_test, y_pred_class))

我只是想在屏幕上显示算法指标。

您需要为这些 sklearn.metrics 函数设置 average 关键字参数。例如,查看 documentation of f1_score。这是对应于 average 关键字 arg 的部分:

average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

'binary':
  Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
'micro':
  Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':
  Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted':
  Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).

This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).

在这里我们可以看到,这描述了如何在多类任务的不同标签上聚合结果。我不确定您想使用哪一个,但 micro 看起来不错。以下是您对 f1_score 的调用在这个选择下的样子:

print(metrics.f1_score(y_test, y_pred_class, average='micro'))

您可以类似地调整其他指标。希望这有帮助。