我应该如何修改 SVM 方法的测试数据才能正确使用“预计算”核函数？

Question

我正在使用 sklearn.svm.SVR 作为 "regression task"，我想使用我的 "customized kernel method"。这是数据集示例和代码：

 index   density     speed        label
 0         14      58.844020    77.179139
 1         29      67.624946    78.367394
 2         44      77.679100    79.143744
 3         59      79.361877    70.048869
 4         74      72.529289    74.499239
 .... and so on

from sklearn import svm
import pandas as pd
import numpy as np

density = np.random.randint(0,100, size=(3000, 1))
speed   = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))
label   = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))

d    = np.hstack((a,b,c))
data = pd.DataFrame(d, columns=['density', 'speed', 'label'])
data.density = data.density.astype(dtype=np.int32)

def my_kernel(X,Y):
    return np.dot(X,X.T)

svr = svm.SVR(kernel=my_kernel)
x = data[['density', 'speed']].iloc[:2000]
y = data['label'].iloc[:2000]
x_t = data[['density', 'speed']].iloc[2000:3000]
y_t = data['label'].iloc[2000:3000]

svr.fit(x,y)
y_preds = svr.predict(x_t)

问题发生在最后一行 svm.predict 上面写着：

X.shape[1] = 1000 should be equal to 2000, the number of samples at training time

我在网上搜索了解决该问题的方法，但许多类似的问题（如 {1}, {2}, {3}）都没有得到解答。

实际上，我之前曾将 SVM 方法与 rbf、sigmoid、...一起使用，代码运行良好，但这是我第一次使用自定义内核，我怀疑它一定是这个错误发生的原因。

所以经过一些研究和阅读文档后，我发现当使用 precomputed 内核时，SVR.predict() 的矩阵形状必须像 [n_samples_test, n_samples_train] 形状。

我想知道如何修改 x_test 以获得预测并且一切正常，没有问题，就像我们不使用自定义内核时一样？

如果可能请描述"the reason that why the inputs for svm.predict function in precomputed kernel differentiates with the other kernels"。

非常希望与此问题相关的悬而未决的问题能够分别得到解答。

Answer 1

shape does not match 表示测试数据和train数据的shape不相等，总是想着numpy中的matrix或者array。如果你在做任何算术运算，你总是需要一个相似的形状。这就是我们检查 array.shape 的原因。 [n_samples_test, n_samples_train] 你可以修改形状，但这不是最好的主意。

array.shape、整形、调整大小 用于

Answer 2

问题出在您的内核函数中，它无法完成工作。

正如文档 https://scikit-learn.org/stable/modules/svm.html#using-python-functions-as-kernels 所说，"Your kernel must take as arguments two matrices of shape (n_samples_1, n_features), (n_samples_2, n_features) and return a kernel matrix of shape (n_samples_1, n_samples_2)." 同一页面上的示例内核满足此条件：

def my_kernel(X, Y):
    return np.dot(X, Y.T)

在你的函数中，dot 的第二个参数是 X.T，因此输出的形状将是 (n_samples_1, n_samples_1)，这不是预期的。

我应该如何修改 SVM 方法的测试数据才能正确使用“预计算”核函数？

How should I modify the test data for SVM method to be able to use the `precomputed` kernel function without error?

python

regression

numpy

svm

scikit-learn