对 sklearn 对 OSVM 的实现感到困惑

Question

我最近开始尝试使用 OneClassSVM（使用 Sklearn）进行无监督学习，然后我跟进了 this example 。

对于愚蠢的问题，我深表歉意，但我对两件事有点困惑：

我应该在常规示例案例和异常值上训练我的支持向量机，还是只在常规示例上训练？
OSVM预测的哪些标签代表异常值是1还是-1

我再次为这些问题道歉，但出于某种原因，我无论如何都找不到这个文档

Answer 1

因为你引用的这个例子是关于 novelty-detection，docs 说：

novelty detection:

The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.

意思是：你应该只训练常规样本。

该方法基于：

Schölkopf, Bernhard, et al. "Estimating the support of a high-dimensional distribution." Neural computation 13.7 (2001): 1443-1471.

摘录：

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specied value between 0 and 1.

We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement.

上面的文档还说：

Inliers are labeled 1, while outliers are labeled -1.

这个在你的示例代码中也可以看到，摘录：

# Generate some regular novel observations
X = 0.3 * np.random.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
...
# all regular = inliers (defined above)
y_pred_test = clf.predict(X_test)  
...
# -1 = outlier <-> error as assumed to be inlier
n_error_test = y_pred_test[y_pred_test == -1].size

对 sklearn 对 OSVM 的实现感到困惑

Confused about sklearn’s implementation of OSVM

machine-learning

svm

libsvm

scikit-learn

sklearn-pandas