如何使用 SVM 分类器检测百分比变化的异常值？

Question

我有一个 pandas 数据框，格式如下：

这包含 3 家公司 MSFT、F 和 BAC 每天的股价变化百分比。

我想使用 OneClassSVM 计算器来检测数据是否异常。我尝试了以下代码，我相信它可以检测到包含异常值的行。

#Import libraries
from sklearn.svm import OneClassSVM
import matplotlib.pyplot as plt


#Create SVM Classifier
svm = OneClassSVM(kernel='rbf', 
gamma=0.001, nu=0.03)
#Use svm to fit and predict
svm.fit(delta)
pred = svm.predict(delta)

#If the values are outlier the prediction 
#would be -1
outliers = where(pred==-1)
#Print rows with outliers
print(outliers)

这给出了以下输出：

然后我想在我的数据框中添加一个新列，其中包括数据是否为离群值。我尝试了以下代码，但由于列表长度不同，如下所示，我收到错误消息。

condition = (delta.index.isin(outliers))

assigned_value = "outlier"

df['isoutlier'] = np.select(condition, 
assigned_value)

您能否告诉我我可以添加此列，因为包含离群值的行列表要短得多？

Answer 1

您的代码中的 delta 和 df 不是很清楚。我假设它们是相同的数据框。

您可以使用 svm.predict 的结果，这里我们将其留空 '' 如果不是离群值：

import numpy as np
df = pd.DataFrame(np.random.uniform(0,1,(100,3)),columns=['A','B','C'])

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03)
svm.fit(df)
pred = svm.predict(df)

df['isoutlier'] = np.where(pred == -1 ,'outlier','')

           A         B         C isoutlier
0   0.869475  0.752420  0.388898          
1   0.177420  0.694438  0.129073          
2   0.011222  0.245425  0.417329          
3   0.791647  0.265672  0.401144          
4   0.538580  0.252193  0.142094          
..       ...       ...       ...       ...
95  0.742192  0.079426  0.676820   outlier
96  0.619767  0.702513  0.734390          
97  0.872848  0.251184  0.887500   outlier
98  0.950669  0.444553  0.088101          
99  0.209207  0.882629  0.184912

如何使用 SVM 分类器检测百分比变化的异常值？

How can I use SVM classifier to detect outliers in percentage changes?

python

data-mining

svm

outliers

dataframe