在 scikit 学习中使用逻辑回归的交叉验证找到最佳 Lasso/L1 正则化强度
Find optimal Lasso/L1 regularization strength using cross validation for logistic regression in scikit learn
对于我的逻辑回归模型,我想使用交叉验证(例如:5 折)代替单个测试训练集来评估最佳 L1 正则化强度,如下面的代码所示:
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X_scaled,y, stratify=y, test_size=0.3,
random_state=2)
#Evaluate L1 regularization strengths for reducing features in final model
C = [10, 1, .1, 0.05,.01,.001] # As C decreases, more coefficients go to zero
for c in C:
clf = LogisticRegression(penalty='l1', C=c, solver='liblinear', class_weight="balanced")
clf.fit(train_x, train_y)
pred_y=clf.predict(test_x)
print("Model performance with Inverse Regularization Parameteter, C = 1/λ VALUE: ", c)
cr=metrics.classification_report(test_y, pred_y)
print(cr)
print('')
有人可以告诉我如何使用交叉验证(即不将上述代码复制 5 次和不同的随机状态)在 5 个不同的测试训练集上执行此操作吗?
实际上,classification_report
作为一个指标并未定义为 sklearn.model_selection.cross_val_score
中的评分指标。因此,我将在以下代码中使用 f1_micro
:
from sklearn.model_selection import cross_val_score
#Evaluate L1 regularization strengths for reducing features in final model
C = [10, 1, .1, 0.05,.01,.001] # As C decreases, more coefficients go to zero
for c in C:
clf = LogisticRegression(penalty='l1', C=c, solver='liblinear', class_weight="balanced")
# using data before splitting (X_scaled) and (y)
scores = cross_val_score(clf, X_scaled, y, cv=5, scoring="f1_micro") #<-- add this
print(scores) #<-- add this
变量 scores
现在是一个包含五个值的列表,表示原始数据的五个不同拆分的分类器的 f1_micro
值。
如果您想在sklearn.model_selection.cross_val_score
中使用另一个评分指标,您可以使用以下命令获取所有可用的评分指标:
print(metrics.SCORERS.keys())
此外,您可以使用多个评分指标;以下同时使用 f1_micro
和 f1_macro
:
from sklearn.model_selection import cross_validate
cross_validate(clf, X_scaled, y, cv=5, scoring=["f1_micro", "f1_macro"])
对于我的逻辑回归模型,我想使用交叉验证(例如:5 折)代替单个测试训练集来评估最佳 L1 正则化强度,如下面的代码所示:
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X_scaled,y, stratify=y, test_size=0.3,
random_state=2)
#Evaluate L1 regularization strengths for reducing features in final model
C = [10, 1, .1, 0.05,.01,.001] # As C decreases, more coefficients go to zero
for c in C:
clf = LogisticRegression(penalty='l1', C=c, solver='liblinear', class_weight="balanced")
clf.fit(train_x, train_y)
pred_y=clf.predict(test_x)
print("Model performance with Inverse Regularization Parameteter, C = 1/λ VALUE: ", c)
cr=metrics.classification_report(test_y, pred_y)
print(cr)
print('')
有人可以告诉我如何使用交叉验证(即不将上述代码复制 5 次和不同的随机状态)在 5 个不同的测试训练集上执行此操作吗?
实际上,classification_report
作为一个指标并未定义为 sklearn.model_selection.cross_val_score
中的评分指标。因此,我将在以下代码中使用 f1_micro
:
from sklearn.model_selection import cross_val_score
#Evaluate L1 regularization strengths for reducing features in final model
C = [10, 1, .1, 0.05,.01,.001] # As C decreases, more coefficients go to zero
for c in C:
clf = LogisticRegression(penalty='l1', C=c, solver='liblinear', class_weight="balanced")
# using data before splitting (X_scaled) and (y)
scores = cross_val_score(clf, X_scaled, y, cv=5, scoring="f1_micro") #<-- add this
print(scores) #<-- add this
变量 scores
现在是一个包含五个值的列表,表示原始数据的五个不同拆分的分类器的 f1_micro
值。
如果您想在sklearn.model_selection.cross_val_score
中使用另一个评分指标,您可以使用以下命令获取所有可用的评分指标:
print(metrics.SCORERS.keys())
此外,您可以使用多个评分指标;以下同时使用 f1_micro
和 f1_macro
:
from sklearn.model_selection import cross_validate
cross_validate(clf, X_scaled, y, cv=5, scoring=["f1_micro", "f1_macro"])