使用来自 sklearn 的 LinearSVC 的不一致预测结果,
Inconsistent prediction results using LinearSVC from sklearn,
我正在使用 SKLearn 的 LinearSVC (LibLinear) 执行简单分类。
我无法直接重现预测值并获得与 "LinearSVC.predict" 相同的准确度。
我做错了什么?以下代码是独立的,突出了我的问题。
import scipy as sc
import numpy as np
from sklearn.svm import LinearSVC #liblinear
N=6000
m=500
D = sc.sparse.random(N,m, random_state = 1)
D.data *= 2
D.data -= 1
X = sc.sparse.csr_matrix(D)
y = (X.sum(axis = 1) > .0)*2-1.0
x_train = X[:5000,:]
y_train = y[:5000,:]
x_test = X[5000:,:]
y_test = y[5000:,:]
clf = LinearSVC(C=.1, fit_intercept = False, loss= 'hinge')
clf.fit(x_train,np.array(y_train))
print "Direct prediction accuracy:\t",100-100*np.mean((np.sign(x_test*clf.coef_.T)!=y_test)+0.0) ,"%"
print "CLF prediction accuracy:\t", 100*clf.score(x_test,y_test),"%"
输出:
Direct prediction accuracy: 90.8 %
CLF prediction accuracy: 91.3 %
感谢您的帮助!
不同之处在于您如何处理零,当使用 np.sign
时,您在结果中有零,这些零未分类到任何有效的 类(1 或 -1,因为您有一个二元分类器);另一方面,Classifier.predict 严格输出两个 类;从 np.sign(x_test*clf.coef_.T)
到 (np.where(x_test * clf.coef_.T > 0, 1, -1)
的预测方法的微小变化将提供与内置 predict 方法完全相同的准确性:
print "Direct prediction accuracy:\t", 100-100*np.mean((np.where(x_test * clf.coef_.T > 0, 1, -1) != y_test)+0.0) ,"%"
print "CLF prediction accuracy:\t", 100*clf.score(x_test, y_test),"%"
# Direct prediction accuracy: 92.7 %
# CLF prediction accuracy: 92.7 %
我正在使用 SKLearn 的 LinearSVC (LibLinear) 执行简单分类。
我无法直接重现预测值并获得与 "LinearSVC.predict" 相同的准确度。
我做错了什么?以下代码是独立的,突出了我的问题。
import scipy as sc
import numpy as np
from sklearn.svm import LinearSVC #liblinear
N=6000
m=500
D = sc.sparse.random(N,m, random_state = 1)
D.data *= 2
D.data -= 1
X = sc.sparse.csr_matrix(D)
y = (X.sum(axis = 1) > .0)*2-1.0
x_train = X[:5000,:]
y_train = y[:5000,:]
x_test = X[5000:,:]
y_test = y[5000:,:]
clf = LinearSVC(C=.1, fit_intercept = False, loss= 'hinge')
clf.fit(x_train,np.array(y_train))
print "Direct prediction accuracy:\t",100-100*np.mean((np.sign(x_test*clf.coef_.T)!=y_test)+0.0) ,"%"
print "CLF prediction accuracy:\t", 100*clf.score(x_test,y_test),"%"
输出:
Direct prediction accuracy: 90.8 %
CLF prediction accuracy: 91.3 %
感谢您的帮助!
不同之处在于您如何处理零,当使用 np.sign
时,您在结果中有零,这些零未分类到任何有效的 类(1 或 -1,因为您有一个二元分类器);另一方面,Classifier.predict 严格输出两个 类;从 np.sign(x_test*clf.coef_.T)
到 (np.where(x_test * clf.coef_.T > 0, 1, -1)
的预测方法的微小变化将提供与内置 predict 方法完全相同的准确性:
print "Direct prediction accuracy:\t", 100-100*np.mean((np.where(x_test * clf.coef_.T > 0, 1, -1) != y_test)+0.0) ,"%"
print "CLF prediction accuracy:\t", 100*clf.score(x_test, y_test),"%"
# Direct prediction accuracy: 92.7 %
# CLF prediction accuracy: 92.7 %