sklearn SVM custom kernel raise ValueError: X.shape[0] should be equal to X.shape[1]
sklearn SVM custom kernel raise ValueError: X.shape[0] should be equal to X.shape[1]
我正在尝试实现一个自定义内核,准确地说是指数卡方内核,作为参数传递给 sklearn svm 函数,但是当我 运行 它引发后续错误时:
ValueError: X.shape[0] 应该等于 X.shape[1]
我阅读了 numpy 函数执行的广播操作以加快计算速度,但我无法处理错误。
密码是:
import numpy as np
from sklearn import svm, datasets
# import the iris dataset (http://en.wikipedia.org/wiki/Iris_flower_data_set)
iris = datasets.load_iris()
train_features = iris.data[:, :2] # Here we only use the first two features.
train_labels = iris.target
def my_kernel(x, y):
gamma = 1
return np.exp(-gamma * np.divide((x - y) ** 2, x + y))
classifier = svm.SVC(kernel=my_kernel)
classifier = classifier.fit(train_features, train_labels)
print "Train Accuracy : " + str(classifier.score(train_features, train_labels))
有什么帮助吗?
我相信卡方内核已经为您实现(在 from sklearn.metrics.pairwise import chi2_kernel
)。
像这样
from functools import partial
from sklearn import svm, datasets
from sklearn.metrics.pairwise import chi2_kernel
# import the iris dataset (http://en.wikipedia.org/wiki/Iris_flower_data_set)
iris = datasets.load_iris()
train_features = iris.data[:, :2] # Here we only use the first two features.
train_labels = iris.target
my_chi2_kernel = partial(chi2_kernel, gamma=1)
classifier = svm.SVC(kernel=my_chi2_kernel)
classifier = classifier.fit(train_features, train_labels)
print("Train Accuracy : " + str(classifier.score(train_features, train_labels)))
====================
编辑:
原来问题实际上是关于如何实现卡方核的问题。我对此的看法是:-
def my_chi2_kernel(X):
gamma = 1
nom = np.power(X[:, np.newaxis] - X, 2)
denom = X[:, np.newaxis] + X
# NOTE: We need to fix some entries, since division by 0 is an issue here.
# So we take all the index of would be 0 denominator, and fix them.
zero_denom_idx = denom == 0
nom[zero_denom_idx] = 0
denom[zero_denom_idx] = 1
return np.exp(-gamma * np.sum(nom / denom, axis=len(X.shape)))
所以本质上,OP 尝试中的 x - y
和 x + y
是错误的,因为它不是成对的减法或加法。
奇怪的是,自定义版本似乎比 sklearn 的 cythonised 版本更快(至少对于小数据集?)
我正在尝试实现一个自定义内核,准确地说是指数卡方内核,作为参数传递给 sklearn svm 函数,但是当我 运行 它引发后续错误时: ValueError: X.shape[0] 应该等于 X.shape[1]
我阅读了 numpy 函数执行的广播操作以加快计算速度,但我无法处理错误。
密码是:
import numpy as np
from sklearn import svm, datasets
# import the iris dataset (http://en.wikipedia.org/wiki/Iris_flower_data_set)
iris = datasets.load_iris()
train_features = iris.data[:, :2] # Here we only use the first two features.
train_labels = iris.target
def my_kernel(x, y):
gamma = 1
return np.exp(-gamma * np.divide((x - y) ** 2, x + y))
classifier = svm.SVC(kernel=my_kernel)
classifier = classifier.fit(train_features, train_labels)
print "Train Accuracy : " + str(classifier.score(train_features, train_labels))
有什么帮助吗?
我相信卡方内核已经为您实现(在 from sklearn.metrics.pairwise import chi2_kernel
)。
像这样
from functools import partial
from sklearn import svm, datasets
from sklearn.metrics.pairwise import chi2_kernel
# import the iris dataset (http://en.wikipedia.org/wiki/Iris_flower_data_set)
iris = datasets.load_iris()
train_features = iris.data[:, :2] # Here we only use the first two features.
train_labels = iris.target
my_chi2_kernel = partial(chi2_kernel, gamma=1)
classifier = svm.SVC(kernel=my_chi2_kernel)
classifier = classifier.fit(train_features, train_labels)
print("Train Accuracy : " + str(classifier.score(train_features, train_labels)))
====================
编辑:
原来问题实际上是关于如何实现卡方核的问题。我对此的看法是:-
def my_chi2_kernel(X):
gamma = 1
nom = np.power(X[:, np.newaxis] - X, 2)
denom = X[:, np.newaxis] + X
# NOTE: We need to fix some entries, since division by 0 is an issue here.
# So we take all the index of would be 0 denominator, and fix them.
zero_denom_idx = denom == 0
nom[zero_denom_idx] = 0
denom[zero_denom_idx] = 1
return np.exp(-gamma * np.sum(nom / denom, axis=len(X.shape)))
所以本质上,OP 尝试中的 x - y
和 x + y
是错误的,因为它不是成对的减法或加法。
奇怪的是,自定义版本似乎比 sklearn 的 cythonised 版本更快(至少对于小数据集?)