GridSearchCV 与估计器 LogisticRegression 的无效参数伽玛
GridSearchCV with Invalid parameter gamma for estimator LogisticRegression
我需要对逻辑回归分类器下面列出的参数执行网格搜索,使用召回率进行评分和交叉验证 3 次。
数据在 csv 文件 (11,1 MB) 中,此 link 下载是:https://drive.google.com/file/d/1cQFp7HteaaL37CefsbMNuHqPzkINCVzs/view?usp=sharing
我有grid_values = {'gamma':[0.01, 0.1, 1, 10, 100]}
我需要在逻辑回归中应用惩罚 L1 e L2
我无法验证分数是否会 运行 因为我有以下错误:
估计器 LogisticRegression 的参数伽玛无效。使用 estimator.get_params().keys()
.
检查可用参数列表
这是我的代码:
from sklearn.model_selection import train_test_split
df = pd.read_csv('fraud_data.csv')
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
def LogisticR_penalty():
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
grid_values = {'gamma':[0.01, 0.1, 1, 10, 100]}
#train de model with many parameters for "C" and penalty='l1'
lr_l1 = LogisticRegression(penalty='l1')
grid_lr_l1 = GridSearchCV(lr_l1, param_grid = grid_values, cv=3, scoring = 'recall')
grid_lr_l1.fit(X_train, y_train)
y_decision_fn_scores_recall = grid_lr_l1.decision_function(X_test)
lr_l2 = LogisticRegression(penalty='l2')
grid_lr_l2 = GridSearchCV(lr_l2, param_grid = grid_values, cv=3 , scoring = 'recall')
grid_lr_l2.fit(X_train, y_train)
y_decision_fn_scores_recall = grid_lr_l2.decision_function(X_test)
#The precision, recall, and accuracy scores for every combination
#of the parameters in param_grid are stored in cv_results_
results = pd.DataFrame()
results['l1_results'] = pd.DataFrame(grid_lr_l1.cv_results_)
results['l1_results'] = results['l2_results'].sort_values(by='mean_test_precision_score', ascending=False)
results['l2_results'] = pd.DataFrame(grid_lr_l2.cv_results_)
results['l2_results'] = results['l2_results'].sort_values(by='mean_test_precision_score', ascending=False)
return results
LogisticR_penalty()
我预计从 .cv_results_,我应该可以在此处获得每个参数组合的平均测试分数:mean_test_precision_score 但不确定
输出为:ValueError:估计器 LogisticRegression 的无效参数 gamma。使用 estimator.get_params().keys()
.
检查可用参数列表
从scikit-learn's documentation开始,LogisticRegression
没有参数gamma
,但有参数C
用于正则化权重。
如果您将 grid_values = {'gamma':[0.01, 0.1, 1, 10, 100]}
更改为 grid_values = {'C':[0.01, 0.1, 1, 10, 100]}
,您的代码应该可以工作。
错误消息包含您问题的答案。您可以使用函数 estimator.get_params().keys()
查看您估算器的所有可用参数:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
print(lr.get_params().keys())
输出:
dict_keys(['C', 'class_weight', 'dual', 'fit_intercept', 'intercept_scaling', 'l1_ratio', 'max_iter', 'multi_class', 'n_jobs', 'penalty', 'random_state', 'solver', 'tol', 'verbose', 'warm_start'])
我的代码包含一些错误,主要错误是不正确地使用 param_grid。我必须使用伽马 0.01、0.1、1、10、100 应用 L1 和 L2 惩罚。正确的方法是:
grid_values = {'penalty': ['l1', 'l2'], 'C': [0.01, 0.1, 1, 10, 100]}
然后有必要纠正我训练逻辑回归的方式,并纠正我在 cv_results_ 中检索分数并对这些分数进行平均的方式。
按照我的代码:
from sklearn.model_selection import train_test_split
df = pd.read_csv('fraud_data.csv')
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
def LogisticR_penalty():
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
grid_values = {'penalty': ['l1', 'l2'], 'C': [0.01, 0.1, 1, 10, 100]}
#train de model with many parameters for "C" and penalty='l1'
lr = LogisticRegression()
# We use GridSearchCV to find the value of the range that optimizes a given measurement metric.
grid_lr_recall = GridSearchCV(lr, param_grid = grid_values, cv=3, scoring = 'recall')
grid_lr_recall.fit(X_train, y_train)
y_decision_fn_scores_recall = grid_lr_recall.decision_function(X_test)
##The precision, recall, and accuracy scores for every combination
#of the parameters in param_grid are stored in cv_results_
CVresults = []
CVresults = pd.DataFrame(grid_lr_recall.cv_results_)
#test scores and mean of them
split_test_scores = np.vstack((CVresults['split0_test_score'], CVresults['split1_test_score'], CVresults['split2_test_score']))
mean_scores = split_test_scores.mean(axis=0).reshape(5, 2)
return mean_scores
LogisticR_penalty()
我需要对逻辑回归分类器下面列出的参数执行网格搜索,使用召回率进行评分和交叉验证 3 次。
数据在 csv 文件 (11,1 MB) 中,此 link 下载是:https://drive.google.com/file/d/1cQFp7HteaaL37CefsbMNuHqPzkINCVzs/view?usp=sharing
我有grid_values = {'gamma':[0.01, 0.1, 1, 10, 100]}
我需要在逻辑回归中应用惩罚 L1 e L2
我无法验证分数是否会 运行 因为我有以下错误:
估计器 LogisticRegression 的参数伽玛无效。使用 estimator.get_params().keys()
.
这是我的代码:
from sklearn.model_selection import train_test_split
df = pd.read_csv('fraud_data.csv')
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
def LogisticR_penalty():
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
grid_values = {'gamma':[0.01, 0.1, 1, 10, 100]}
#train de model with many parameters for "C" and penalty='l1'
lr_l1 = LogisticRegression(penalty='l1')
grid_lr_l1 = GridSearchCV(lr_l1, param_grid = grid_values, cv=3, scoring = 'recall')
grid_lr_l1.fit(X_train, y_train)
y_decision_fn_scores_recall = grid_lr_l1.decision_function(X_test)
lr_l2 = LogisticRegression(penalty='l2')
grid_lr_l2 = GridSearchCV(lr_l2, param_grid = grid_values, cv=3 , scoring = 'recall')
grid_lr_l2.fit(X_train, y_train)
y_decision_fn_scores_recall = grid_lr_l2.decision_function(X_test)
#The precision, recall, and accuracy scores for every combination
#of the parameters in param_grid are stored in cv_results_
results = pd.DataFrame()
results['l1_results'] = pd.DataFrame(grid_lr_l1.cv_results_)
results['l1_results'] = results['l2_results'].sort_values(by='mean_test_precision_score', ascending=False)
results['l2_results'] = pd.DataFrame(grid_lr_l2.cv_results_)
results['l2_results'] = results['l2_results'].sort_values(by='mean_test_precision_score', ascending=False)
return results
LogisticR_penalty()
我预计从 .cv_results_,我应该可以在此处获得每个参数组合的平均测试分数:mean_test_precision_score 但不确定
输出为:ValueError:估计器 LogisticRegression 的无效参数 gamma。使用 estimator.get_params().keys()
.
从scikit-learn's documentation开始,LogisticRegression
没有参数gamma
,但有参数C
用于正则化权重。
如果您将 grid_values = {'gamma':[0.01, 0.1, 1, 10, 100]}
更改为 grid_values = {'C':[0.01, 0.1, 1, 10, 100]}
,您的代码应该可以工作。
错误消息包含您问题的答案。您可以使用函数 estimator.get_params().keys()
查看您估算器的所有可用参数:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
print(lr.get_params().keys())
输出:
dict_keys(['C', 'class_weight', 'dual', 'fit_intercept', 'intercept_scaling', 'l1_ratio', 'max_iter', 'multi_class', 'n_jobs', 'penalty', 'random_state', 'solver', 'tol', 'verbose', 'warm_start'])
我的代码包含一些错误,主要错误是不正确地使用 param_grid。我必须使用伽马 0.01、0.1、1、10、100 应用 L1 和 L2 惩罚。正确的方法是:
grid_values = {'penalty': ['l1', 'l2'], 'C': [0.01, 0.1, 1, 10, 100]}
然后有必要纠正我训练逻辑回归的方式,并纠正我在 cv_results_ 中检索分数并对这些分数进行平均的方式。 按照我的代码:
from sklearn.model_selection import train_test_split
df = pd.read_csv('fraud_data.csv')
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
def LogisticR_penalty():
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
grid_values = {'penalty': ['l1', 'l2'], 'C': [0.01, 0.1, 1, 10, 100]}
#train de model with many parameters for "C" and penalty='l1'
lr = LogisticRegression()
# We use GridSearchCV to find the value of the range that optimizes a given measurement metric.
grid_lr_recall = GridSearchCV(lr, param_grid = grid_values, cv=3, scoring = 'recall')
grid_lr_recall.fit(X_train, y_train)
y_decision_fn_scores_recall = grid_lr_recall.decision_function(X_test)
##The precision, recall, and accuracy scores for every combination
#of the parameters in param_grid are stored in cv_results_
CVresults = []
CVresults = pd.DataFrame(grid_lr_recall.cv_results_)
#test scores and mean of them
split_test_scores = np.vstack((CVresults['split0_test_score'], CVresults['split1_test_score'], CVresults['split2_test_score']))
mean_scores = split_test_scores.mean(axis=0).reshape(5, 2)
return mean_scores
LogisticR_penalty()