为什么这个干净的数据会提供奇怪的 SVM 分类结果？

Question

我的问题和疑问在下面加粗。

我已成功使用 Accord.NET 的支持向量机，按照 this one. However, when using a KernelSupportVectorMachine with a OneclassSupportVectorLearning 等文档页面上的示例对其进行训练，训练过程会导致较大的错误值和错误class化验.

下面的小例子说明了我的意思。它生成密集的训练点集群，然后训练 SVM 以 class 将点确定为集群的异常值或异常值。训练集群只是一个以原点为中心的 0.6 x 0.6 正方形，训练点的间隔为 0.1:

static void Main(string[] args)
{
    // Model and training parameters
    double kernelSigma = 0.1;
    double teacherNu = 0.5;
    double teacherTolerance = 0.01;


    // Generate input point cloud, a 0.6 x 0.6 square centered at 0,0.
    double[][] trainingInputs = new double[49][];
    int inputIdx = 0;
    for (double x = -0.3; x <= 0.31; x += 0.1) {
        for (double y = -0.3; y <= 0.31; y += 0.1) {
            trainingInputs[inputIdx] = new double[] { x, y };
            inputIdx++;
        }
    }


    // Generate inlier and outlier test points.
    double[][] outliers =
    {
        new double[] { 1E6, 1E6 },  // Very far outlier
        new double[] { 0, 1E6 },    // Very far outlier
        new double[] { 100, -100 }, // Far outlier
        new double[] { 0, -100 },   // Far outlier
        new double[] { -10, -10 },  // Still far outlier
        new double[] { 0, -10 },    // Still far outlier
    };
    double[][] inliers =
    {
        new double[] { 0, 0 },      // Middle of cluster
        new double[] { .15, .15 },  // Halfway to corner of cluster
        new double[] { -0.1, 0 },   // Comfortably inside cluster
        new double[] { 0.25, 0 }    // Near inside edge of cluster
    };


    // Construct the kernel, model, and trainer, then train.
    Console.WriteLine($"Training model with parameters:");
    Console.WriteLine($"  kernelSigma = {kernelSigma.ToString("#.##")}");
    Console.WriteLine($"  teacherNu={teacherNu.ToString("#.##")}");
    Console.WriteLine($"  teacherTolerance={teacherTolerance}");
    Console.WriteLine();

    var kernel = new Gaussian(kernelSigma);
    var svm = new KernelSupportVectorMachine(kernel, inputs: 1);
    var teacher = new OneclassSupportVectorLearning(svm, trainingInputs)
    {
        Nu = teacherNu,
        Tolerance = teacherTolerance
    };
    double error = teacher.Run();

    Console.WriteLine($"Training complete - error is {error.ToString("#.##")}");
    Console.WriteLine();


    // Test trained classifier.
    Console.WriteLine("Testing outliers:");
    foreach (double[] outlier in outliers) {
        WriteResultDetail(svm, outlier);
    }
    Console.WriteLine();
    Console.WriteLine("Testing inliers:");
    foreach (double[] inlier in inliers) {
        WriteResultDetail(svm, inlier);
    }
}

private static void WriteResultDetail(KernelSupportVectorMachine svm, double[] coordinate)
{
    string prettyCoord = $"{{ {string.Join(", ", coordinate)} }}".PadRight(20);
    Console.Write($"Classifying: {prettyCoord} Result: ");

    // Classify coordinate, print results.
    double result = svm.Compute(coordinate);
    if (Math.Sign(result) == 1) {
        Console.Write("Inlier");
    }
    else {
        Console.Write("Outlier");
    }
    Console.Write($" ({result.ToString("#.##")})\n");
}

这是合理参数集的输出：

Training model with parameters:
  kernelSigma = .1
  teacherNu=.5
  teacherTolerance=0.01

Training complete - error is 222.4

Testing outliers:
Classifying: { 1000000, 1000000 } Result: Inlier (2.28)
Classifying: { 0, 1000000 }       Result: Inlier (2.28)
Classifying: { 100, -100 }        Result: Inlier (2.28)
Classifying: { 0, -100 }          Result: Inlier (2.28)
Classifying: { -10, -10 }         Result: Inlier (2.28)
Classifying: { 0, -10 }           Result: Inlier (2.28)

Testing inliers:
Classifying: { 0, 0 }             Result: Inlier (4.58)
Classifying: { 0.15, 0.15 }       Result: Inlier (4.51)
Classifying: { -0.1, 0 }          Result: Inlier (4.55)
Classifying: { 0.25, 0 }          Result: Inlier (4.64)

括号中的数字是SVM对该坐标给出的分数。使用来自 Accord.NET 的 SVM（通常），负分是一个 class，正分是另一个。在这里，一切都有积极的分数。异常值 class 正确化，但异常值（甚至非常远的异常值）也被 class 化为异常值。

请注意，在我用 Accord.NET 训练模型的任何其他时间，训练误差都非常接近于零，但这里超过 200。

这是另一个参数集的输出：

Training model with parameters:
  kernelSigma = .3
  teacherNu=.8
  teacherTolerance=0.01

Training complete - error is 1945.67

Testing outliers:
Classifying: { 1000000, 1000000 } Result: Inlier (20.96)
Classifying: { 0, 1000000 }       Result: Inlier (20.96)
Classifying: { 100, -100 }        Result: Inlier (20.96)
Classifying: { 0, -100 }          Result: Inlier (20.96)
Classifying: { -10, -10 }         Result: Inlier (20.96)
Classifying: { 0, -10 }           Result: Inlier (20.96)

Testing inliers:
Classifying: { 0, 0 }             Result: Inlier (44.52)
Classifying: { 0.15, 0.15 }       Result: Inlier (41.62)
Classifying: { -0.1, 0 }          Result: Inlier (43.85)
Classifying: { 0.25, 0 }          Result: Inlier (40.53)

同样，非常高的训练误差，所有正分数。

模型肯定从训练中得到了东西 - 离群值和离群值之间的分数不同。但是为什么这个简单的场景没有给出应有的正负号不同的结果？

PS。 Here is a similar program that tests many combinations of training and model parameters, and here is its output。同样，一切都会导致正 class 化分数、高错误值和不正确的 class 化离群值。

Answer 1

issue raised in the question has been addressed in version 3.7.0 of Accord.NET. A unit test with an example similar to yours has also been added in commit be81aab.

为什么这个干净的数据会提供奇怪的 SVM 分类结果？

Why does this clean data provide strange SVM classification results?

c#

machine-learning

svm

accord.net