kNN 指标中的 ValueError
ValueError in kNN metrics
我有一个项目,包括在 csv 文件中使用 kNN 算法并显示选定的指标。但是当我尝试呈现一些指标时,它会抛出一些错误。
当尝试使用:灵敏度、f1_Score和精度时:
- 灵敏度 - 打印(metrics.recall_score(y_test, y_pred_class))
- F1_score - 打印(指标。f1_score(y_test, y_pred_class))
- Presicion - 打印(metrics.precision_score(y_test, y_pred_class))
Pycharm 抛出以下错误:
ValueError: Target is multiclass but average='binary'. Please choose another average setting
尝试打印 ROC 曲线时出现的错误有点不同:
ValueError: multiclass format is not supported
数据集
LINK 到数据集:https://www.dropbox.com/s/yt3n1eqxlsb816n/Testfile%20-%20kNN.csv?dl=0
计划
import matplotlib
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib.dviread import Text
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
#Tools para teste
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
def main():
dataset = pd.read_csv('filetestKNN.csv')
X = dataset.drop(columns=['Label'])
y = dataset['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.34)
Classifier = KNeighborsClassifier(n_neighbors=2, p=2, metric='euclidean')
Classifier.fit(X_train, y_train)
y_pred_class = Classifier.predict(X_test)
y_pred_prob = Classifier.predict_proba(X_test)[:, 1]
accuracy = Classifier.score(X_test, y_test)
confusion = metrics.confusion_matrix(y_test, y_pred_class)
print()
print("Accuracy")
print(metrics.accuracy_score(y_test, y_pred_class))
print()
print("Classification Error")
print(1 - metrics.accuracy_score(y_test, y_pred_class))
print()
print("Confusion matrix")
print(metrics.confusion_matrix(y_test, y_pred_class))
#error
print(metrics.recall_score(y_test, y_pred_class))
#error
print(metrics.roc_curve(y_test, y_pred_class))
#error
print(metrics.f1_score(y_test, y_pred_class))
#error
print(metrics.precision_score(y_test, y_pred_class))
我只是想在屏幕上显示算法指标。
您需要为这些 sklearn.metrics
函数设置 average
关键字参数。例如,查看 documentation of f1_score
。这是对应于 average
关键字 arg 的部分:
average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’,
‘samples’, ‘weighted’]
This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
'binary':
Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
'micro':
Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted':
Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
This alters ‘macro’ to account for label imbalance; it can result in
an F-score that is not between precision and recall.
'samples':
Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from
accuracy_score).
在这里我们可以看到,这描述了如何在多类任务的不同标签上聚合结果。我不确定您想使用哪一个,但 micro
看起来不错。以下是您对 f1_score
的调用在这个选择下的样子:
print(metrics.f1_score(y_test, y_pred_class, average='micro'))
您可以类似地调整其他指标。希望这有帮助。
我有一个项目,包括在 csv 文件中使用 kNN 算法并显示选定的指标。但是当我尝试呈现一些指标时,它会抛出一些错误。
当尝试使用:灵敏度、f1_Score和精度时:
- 灵敏度 - 打印(metrics.recall_score(y_test, y_pred_class))
- F1_score - 打印(指标。f1_score(y_test, y_pred_class))
- Presicion - 打印(metrics.precision_score(y_test, y_pred_class))
Pycharm 抛出以下错误:
ValueError: Target is multiclass but average='binary'. Please choose another average setting
尝试打印 ROC 曲线时出现的错误有点不同:
ValueError: multiclass format is not supported
数据集
LINK 到数据集:https://www.dropbox.com/s/yt3n1eqxlsb816n/Testfile%20-%20kNN.csv?dl=0
计划
import matplotlib
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib.dviread import Text
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
#Tools para teste
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
def main():
dataset = pd.read_csv('filetestKNN.csv')
X = dataset.drop(columns=['Label'])
y = dataset['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.34)
Classifier = KNeighborsClassifier(n_neighbors=2, p=2, metric='euclidean')
Classifier.fit(X_train, y_train)
y_pred_class = Classifier.predict(X_test)
y_pred_prob = Classifier.predict_proba(X_test)[:, 1]
accuracy = Classifier.score(X_test, y_test)
confusion = metrics.confusion_matrix(y_test, y_pred_class)
print()
print("Accuracy")
print(metrics.accuracy_score(y_test, y_pred_class))
print()
print("Classification Error")
print(1 - metrics.accuracy_score(y_test, y_pred_class))
print()
print("Confusion matrix")
print(metrics.confusion_matrix(y_test, y_pred_class))
#error
print(metrics.recall_score(y_test, y_pred_class))
#error
print(metrics.roc_curve(y_test, y_pred_class))
#error
print(metrics.f1_score(y_test, y_pred_class))
#error
print(metrics.precision_score(y_test, y_pred_class))
我只是想在屏幕上显示算法指标。
您需要为这些 sklearn.metrics
函数设置 average
关键字参数。例如,查看 documentation of f1_score
。这是对应于 average
关键字 arg 的部分:
average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]
This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
'binary': Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary. 'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives. 'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. 'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
在这里我们可以看到,这描述了如何在多类任务的不同标签上聚合结果。我不确定您想使用哪一个,但 micro
看起来不错。以下是您对 f1_score
的调用在这个选择下的样子:
print(metrics.f1_score(y_test, y_pred_class, average='micro'))
您可以类似地调整其他指标。希望这有帮助。