对 `sklearn.svm` 回归器使用自定义内核存在歧义
having ambiguity using customized kernel for `sklearn.svm` regressor
我想在Epsilon-Support Vector Regression module of Sklearn.svm
. I found this code as an example for customized kernel for svc at the scilit-learn documentation中使用自定义内核函数:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
Y = iris.target
def my_kernel(X, Y):
"""
We create a custom kernel:
(2 0)
k(X, Y) = X ( ) Y.T
(0 1)
"""
M = np.array([[2, 0], [0, 1.0]])
return np.dot(np.dot(X, M), Y.T)
h = .02 # step size in the mesh
# we create an instance of SVM and fit out data.
clf = svm.SVC(kernel=my_kernel)
clf.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired, edgecolors='k')
plt.title('3-Class classification using Support Vector Machine with custom'
' kernel')
plt.axis('tight')
plt.show()
我想定义一些函数,例如:
def my_new_kernel(X):
a,b,c = (random.randint(0,100) for _ in range(3))
# imagine f1,f2,f3 are functions like sin(x), cos(x), ...
ans = a*f1(X) + b*f2(X) + c*f3(X)
return ans
我对内核方法的看法是,它是一个函数,它获取特征矩阵(X
)作为输入和 returns 形状为 (n,1) 的 矩阵。然后 svm 将 返回的矩阵 附加到 特征列 并使用它来分类标签 Y
.
在上面的代码中,内核用于 svm.fit
函数,我无法弄清楚 什么是内核的 X
和 Y
输入以及它们的输入形状。如果 X
和 Y
(my_kernel
方法的输入)是数据集的特征和标签,那么内核如何处理我们没有标签的测试数据?
实际上我想将 svm 用于形状为 (10000, 6)
的数据集,(5 列=特征,1 列=标签)然后如果我想使用 my_new_kernel
方法会是什么输入和输出及其形状。
您的确切问题还不清楚;这里有一些可能会有帮助的评论。
I can't figure out what are X and Y inputs of kernel and their shapes. if X and Y (inputs of my_kernel method) are the features and label of dataset,
确实如此;来自 fit
的 documentation:
Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features)
Training vectors, where n_samples is the number of samples and n_features is the number of features. For kernel=”precomputed”, the
expected shape of X is (n_samples, n_samples).
y : array-like, shape(n_samples,)
Target values (class labels in classification, real numbers in regression)
与默认可用内核完全一样。
so then how does the kernel work for test data where we have no labels?
仔细查看您提供的代码会发现标签 Y
确实仅在训练期间使用 (fit
);它们当然不会在预测期间使用(上面代码中的 clf.predict()
- 不要与 yy
混淆,后者与 Y
无关)。
我想在Epsilon-Support Vector Regression module of Sklearn.svm
. I found this code as an example for customized kernel for svc at the scilit-learn documentation中使用自定义内核函数:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
Y = iris.target
def my_kernel(X, Y):
"""
We create a custom kernel:
(2 0)
k(X, Y) = X ( ) Y.T
(0 1)
"""
M = np.array([[2, 0], [0, 1.0]])
return np.dot(np.dot(X, M), Y.T)
h = .02 # step size in the mesh
# we create an instance of SVM and fit out data.
clf = svm.SVC(kernel=my_kernel)
clf.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired, edgecolors='k')
plt.title('3-Class classification using Support Vector Machine with custom'
' kernel')
plt.axis('tight')
plt.show()
我想定义一些函数,例如:
def my_new_kernel(X):
a,b,c = (random.randint(0,100) for _ in range(3))
# imagine f1,f2,f3 are functions like sin(x), cos(x), ...
ans = a*f1(X) + b*f2(X) + c*f3(X)
return ans
我对内核方法的看法是,它是一个函数,它获取特征矩阵(X
)作为输入和 returns 形状为 (n,1) 的 矩阵。然后 svm 将 返回的矩阵 附加到 特征列 并使用它来分类标签 Y
.
在上面的代码中,内核用于 svm.fit
函数,我无法弄清楚 什么是内核的 X
和 Y
输入以及它们的输入形状。如果 X
和 Y
(my_kernel
方法的输入)是数据集的特征和标签,那么内核如何处理我们没有标签的测试数据?
实际上我想将 svm 用于形状为 (10000, 6)
的数据集,(5 列=特征,1 列=标签)然后如果我想使用 my_new_kernel
方法会是什么输入和输出及其形状。
您的确切问题还不清楚;这里有一些可能会有帮助的评论。
I can't figure out what are X and Y inputs of kernel and their shapes. if X and Y (inputs of my_kernel method) are the features and label of dataset,
确实如此;来自 fit
的 documentation:
Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features)
Training vectors, where n_samples is the number of samples and n_features is the number of features. For kernel=”precomputed”, the expected shape of X is (n_samples, n_samples).
y : array-like, shape(n_samples,)
Target values (class labels in classification, real numbers in regression)
与默认可用内核完全一样。
so then how does the kernel work for test data where we have no labels?
仔细查看您提供的代码会发现标签 Y
确实仅在训练期间使用 (fit
);它们当然不会在预测期间使用(上面代码中的 clf.predict()
- 不要与 yy
混淆,后者与 Y
无关)。