sklearn中使用LinearSVC时如何处理收敛警告?

How to deal with convergence warning when using LinearSVC in sklearn?

我在 Scikit 中使用线性支持向量机学习乳腺癌数据时收到收敛警告。

代码如下:

from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0)

clf = LinearSVC(max_iter=700000).fit(X_train, y_train)
print('Breast cancer dataset')
print('Accuracy of Linear SVC classifier on training set: {:.2f}'
     .format(clf.score(X_train, y_train)))
print('Accuracy of Linear SVC classifier on test set: {:.2f}'
     .format(clf.score(X_test, y_test)))

即使迭代次数超多,我仍然收到收敛警告:

ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn("Liblinear failed to converge, increase "

谁能解释一下为什么不能收敛?而且,一般来说,我可以忽略 Convergence 警告,还是需要进一步调整模型?

非常感谢!

svm 方法是基于距离的,您的列在不同的尺度上。所以在拟合模型之前先缩放数据是有意义的。在 post 查看更多信息,例如 this or this

所以如果我们再做一次缩放:

from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_cancer = StandardScaler().fit_transform(X_cancer)

X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0)

clf = LinearSVC().fit(X_train, y_train)

在没有收敛问题的情况下,您获得了相当不错的准确性:

print('Accuracy of Linear SVC classifier on training set: {:.2f}'
     .format(clf.score(X_train, y_train)))
print('Accuracy of Linear SVC classifier on test set: {:.2f}'
     .format(clf.score(X_test, y_test)))

Accuracy of Linear SVC classifier on training set: 0.99
Accuracy of Linear SVC classifier on test set: 0.94