运行 使用 SVM 模型进行交叉验证时出现 ConvergenceWarning
ConvergenceWarning when running cross validation with SVM model
我尝试训练 LinearSVC 模型并在我创建的线性可分数据集上使用 cross_val_score
对其进行评估,但出现错误。
这是一个可重现的例子:
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# creating the dataset
x1 = 2 * np.random.rand(100, 1)
y1 = 5 + 3 * x1 + np.random.randn(100, 1)
lable1 = np.zeros((100, 1))
x2 = 2 * np.random.rand(100, 1)
y2 = 15 + 3 * x2 + np.random.randn(100, 1)
lable2 = np.ones((100, 1))
x = np.concatenate((x1, x2))
y = np.concatenate((y1, y2))
lable = np.concatenate((lable1, lable2))
x = np.reshape(x, (len(x),))
y = np.reshape(y, (len(y),))
lable = np.reshape(lable, (len(lable),))
d = {'x':x, 'y':y, 'lable':lable}
df = pd.DataFrame(data=d)
df.plot(kind="scatter", x="x", y="y")
# preparing data and model
train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)
X = train_set.drop("lable", axis=1)
y = train_set["lable"].copy()
scaler = StandardScaler()
scaler.fit_transform(X)
linear_svc = LinearSVC(C=5, loss="hinge", random_state=42)
linear_svc.fit(X, y)
# evaluation
scores = cross_val_score(linear_svc, X, y, scoring="neg_mean_squared_error", cv=10)
rmse_scores = np.sqrt(-scores)
print("Mean:", rmse_scores.mean())
输出:
Mean: 0.0
/usr/local/lib/python3.7/dist-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
"the number of iterations.", ConvergenceWarning)
这不是错误,而是警告,其中已经包含了一些建议:
increase the number of iterations
默认为 1000 (docs)。
此外,LinearSVC
是一个分类器,因此在 cross_val_score
中使用 scoring="neg_mean_squared_error"
(即回归指标)是没有意义的;有关每种问题的相关指标的粗略列表,请参阅 documentation。
因此,进行了以下更改:
linear_svc = LinearSVC(C=5, loss="hinge", random_state=42, max_iter=100000)
scores = cross_val_score(linear_svc, X, y, scoring="accuracy", cv=10)
您的代码运行正常,没有任何错误或警告。
我尝试训练 LinearSVC 模型并在我创建的线性可分数据集上使用 cross_val_score
对其进行评估,但出现错误。
这是一个可重现的例子:
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# creating the dataset
x1 = 2 * np.random.rand(100, 1)
y1 = 5 + 3 * x1 + np.random.randn(100, 1)
lable1 = np.zeros((100, 1))
x2 = 2 * np.random.rand(100, 1)
y2 = 15 + 3 * x2 + np.random.randn(100, 1)
lable2 = np.ones((100, 1))
x = np.concatenate((x1, x2))
y = np.concatenate((y1, y2))
lable = np.concatenate((lable1, lable2))
x = np.reshape(x, (len(x),))
y = np.reshape(y, (len(y),))
lable = np.reshape(lable, (len(lable),))
d = {'x':x, 'y':y, 'lable':lable}
df = pd.DataFrame(data=d)
df.plot(kind="scatter", x="x", y="y")
# preparing data and model
train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)
X = train_set.drop("lable", axis=1)
y = train_set["lable"].copy()
scaler = StandardScaler()
scaler.fit_transform(X)
linear_svc = LinearSVC(C=5, loss="hinge", random_state=42)
linear_svc.fit(X, y)
# evaluation
scores = cross_val_score(linear_svc, X, y, scoring="neg_mean_squared_error", cv=10)
rmse_scores = np.sqrt(-scores)
print("Mean:", rmse_scores.mean())
输出:
Mean: 0.0
/usr/local/lib/python3.7/dist-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. "the number of iterations.", ConvergenceWarning)
这不是错误,而是警告,其中已经包含了一些建议:
increase the number of iterations
默认为 1000 (docs)。
此外,LinearSVC
是一个分类器,因此在 cross_val_score
中使用 scoring="neg_mean_squared_error"
(即回归指标)是没有意义的;有关每种问题的相关指标的粗略列表,请参阅 documentation。
因此,进行了以下更改:
linear_svc = LinearSVC(C=5, loss="hinge", random_state=42, max_iter=100000)
scores = cross_val_score(linear_svc, X, y, scoring="accuracy", cv=10)
您的代码运行正常,没有任何错误或警告。