sklearn中使用LinearSVC时如何处理收敛警告?
How to deal with convergence warning when using LinearSVC in sklearn?
我在 Scikit 中使用线性支持向量机学习乳腺癌数据时收到收敛警告。
代码如下:
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0)
clf = LinearSVC(max_iter=700000).fit(X_train, y_train)
print('Breast cancer dataset')
print('Accuracy of Linear SVC classifier on training set: {:.2f}'
.format(clf.score(X_train, y_train)))
print('Accuracy of Linear SVC classifier on test set: {:.2f}'
.format(clf.score(X_test, y_test)))
即使迭代次数超多,我仍然收到收敛警告:
ConvergenceWarning: Liblinear failed to converge, increase the number
of iterations. warnings.warn("Liblinear failed to converge, increase
"
谁能解释一下为什么不能收敛?而且,一般来说,我可以忽略 Convergence 警告,还是需要进一步调整模型?
非常感谢!
svm 方法是基于距离的,您的列在不同的尺度上。所以在拟合模型之前先缩放数据是有意义的。在 post 查看更多信息,例如 this or this
所以如果我们再做一次缩放:
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_cancer = StandardScaler().fit_transform(X_cancer)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0)
clf = LinearSVC().fit(X_train, y_train)
在没有收敛问题的情况下,您获得了相当不错的准确性:
print('Accuracy of Linear SVC classifier on training set: {:.2f}'
.format(clf.score(X_train, y_train)))
print('Accuracy of Linear SVC classifier on test set: {:.2f}'
.format(clf.score(X_test, y_test)))
Accuracy of Linear SVC classifier on training set: 0.99
Accuracy of Linear SVC classifier on test set: 0.94
我在 Scikit 中使用线性支持向量机学习乳腺癌数据时收到收敛警告。
代码如下:
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0)
clf = LinearSVC(max_iter=700000).fit(X_train, y_train)
print('Breast cancer dataset')
print('Accuracy of Linear SVC classifier on training set: {:.2f}'
.format(clf.score(X_train, y_train)))
print('Accuracy of Linear SVC classifier on test set: {:.2f}'
.format(clf.score(X_test, y_test)))
即使迭代次数超多,我仍然收到收敛警告:
ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn("Liblinear failed to converge, increase "
谁能解释一下为什么不能收敛?而且,一般来说,我可以忽略 Convergence 警告,还是需要进一步调整模型?
非常感谢!
svm 方法是基于距离的,您的列在不同的尺度上。所以在拟合模型之前先缩放数据是有意义的。在 post 查看更多信息,例如 this or this
所以如果我们再做一次缩放:
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_cancer = StandardScaler().fit_transform(X_cancer)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0)
clf = LinearSVC().fit(X_train, y_train)
在没有收敛问题的情况下,您获得了相当不错的准确性:
print('Accuracy of Linear SVC classifier on training set: {:.2f}'
.format(clf.score(X_train, y_train)))
print('Accuracy of Linear SVC classifier on test set: {:.2f}'
.format(clf.score(X_test, y_test)))
Accuracy of Linear SVC classifier on training set: 0.99
Accuracy of Linear SVC classifier on test set: 0.94