使用 IsolationForest 进行异常检测时,在混淆矩阵中附加了额外的零,使其成为 3x3 而不是 2x2
Extra zeros appended in confusion matrix making it 3x3 instead of 2x2 using IsolationForest for Anomaly detection
我正在使用以下代码来预测异常检测。这是一个二元分类,所以混淆矩阵应该是 2x2 而不是 3x3。 T 形中附加了额外的零。几周前使用 OneClassSVM 也发生了类似的事情,但我认为我做错了什么。你能帮我解决这个问题吗?
import numpy as np
import pandas as pd
import os
from sklearn.ensemble import IsolationForest
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn import metrics
from sklearn.metrics import roc_auc_score
data = pd.read_csv('opensky_train.csv')
#to make sure that normal data contains no anomaly
sortedData = data.sort_values(by=['class'])
target = pd.DataFrame(sortedData['class'])
Y = target.replace(['surveill', 'other'], [1,0])
X = sortedData.drop(['class'], axis = 1)
x_normal = X.iloc[:200,:]
y_normal = Y.iloc[:200,:]
x_anomaly = X.iloc[200:,:]
y_anomaly = Y.iloc[200:,:]
已编辑:
column_values = y_anomaly.values.ravel()
unique_values = pd.unique(column_values)
print(unique_values)
输出:[0 1]
clf = IsolationForest(random_state=0).fit(x_normal)
pred = clf.predict(x_anomaly)
print(pred)
输出:[ 1 1 1 1 1 1 -1 1 -1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 -1 1 1 1 1 1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 -1 1
-1 1 1 -1 -1 1 -1 -1 1 1 1 1 -1 1 1 -1 -1 1 1 1 1 1 1 1
-1 1 1 1 1 1 1 1 1 1 -1]
#printing the results
print(confusion_matrix(y_anomaly, pred))
print (classification_report(y_anomaly, pred))
结果:
Confusion Matrix :
[[ 0 0 0]
[ 7 0 60]
[12 0 28]]
precision recall f1-score support
-1 0.00 0.00 0.00 0
0 0.00 0.00 0.00 67
1 0.32 0.70 0.44 40
accuracy 0.26 107
macro avg 0.11 0.23 0.15 107
weighted avg 0.12 0.26 0.16 107
Inliers are labeled 1, while outliers are labeled -1
来源:scikit-learn Anomaly and Outlier detection.
您的示例已将 类 转换为 0,1 - 因此三个可能的选项是 -1,0,1
您需要从
Y = target.replace(['surveill', 'other'], [1,0])
到
Y = target.replace(['surveill', 'other'], [1,-1])
我正在使用以下代码来预测异常检测。这是一个二元分类,所以混淆矩阵应该是 2x2 而不是 3x3。 T 形中附加了额外的零。几周前使用 OneClassSVM 也发生了类似的事情,但我认为我做错了什么。你能帮我解决这个问题吗?
import numpy as np
import pandas as pd
import os
from sklearn.ensemble import IsolationForest
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn import metrics
from sklearn.metrics import roc_auc_score
data = pd.read_csv('opensky_train.csv')
#to make sure that normal data contains no anomaly
sortedData = data.sort_values(by=['class'])
target = pd.DataFrame(sortedData['class'])
Y = target.replace(['surveill', 'other'], [1,0])
X = sortedData.drop(['class'], axis = 1)
x_normal = X.iloc[:200,:]
y_normal = Y.iloc[:200,:]
x_anomaly = X.iloc[200:,:]
y_anomaly = Y.iloc[200:,:]
已编辑:
column_values = y_anomaly.values.ravel()
unique_values = pd.unique(column_values)
print(unique_values)
输出:[0 1]
clf = IsolationForest(random_state=0).fit(x_normal)
pred = clf.predict(x_anomaly)
print(pred)
输出:[ 1 1 1 1 1 1 -1 1 -1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 -1 1 -1 1 1 -1 -1 1 -1 -1 1 1 1 1 -1 1 1 -1 -1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 -1]
#printing the results
print(confusion_matrix(y_anomaly, pred))
print (classification_report(y_anomaly, pred))
结果:
Confusion Matrix :
[[ 0 0 0]
[ 7 0 60]
[12 0 28]]
precision recall f1-score support
-1 0.00 0.00 0.00 0
0 0.00 0.00 0.00 67
1 0.32 0.70 0.44 40
accuracy 0.26 107
macro avg 0.11 0.23 0.15 107
weighted avg 0.12 0.26 0.16 107
Inliers are labeled 1, while outliers are labeled -1
来源:scikit-learn Anomaly and Outlier detection.
您的示例已将 类 转换为 0,1 - 因此三个可能的选项是 -1,0,1
您需要从
Y = target.replace(['surveill', 'other'], [1,0])
到
Y = target.replace(['surveill', 'other'], [1,-1])