如何在 scikit-learn 的 LogisticRegressionCV 中实现不同的评分函数?
How to implement different scoring functions in LogisticRegressionCV in scikit-learn?
我正在尝试从 scikit-learn 0.16 实现 LogisticRegressionCV class,但很难让它与不同的评分函数一起工作。文档说要从 sklearn.metrics 传递评分函数之一,所以我尝试了以下代码:
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import log_loss
...
model_regression = LogisticRegressionCV(scoring=log_loss)
model_regression.fit(data_combined, winners_losers)
但是我在 fit 函数上得到以下错误:
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line 1381, in fit
for label in iter_labels
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 659, in __call__
self.dispatch(function, args, kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 406, in dispatch
job = ImmediateApply(func, args, kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 140, in __init__
self.results = func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line 844, in _log_reg_scoring_path
scores.append(scoring(log_reg, X_test, y_test))
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 1403, in log_loss
T = lb.fit_transform(y_true)
File "C:\Anaconda3\lib\site-packages\sklearn\base.py", line 433, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 315, in fit
self.y_type_ = type_of_target(y)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 287, in type_of_target
'got %r' % y)
ValueError: Expected array-like (array or non-string sequence), got LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
我在这里做错了什么?如果没有 'scoring=log_loss' 参数,函数就可以正常工作,所以它必须与我传递函数的方式有关?
应该是scoring="neg_log_loss"
,字符串,不是函数。如果你想传递一个可调用对象,它需要有一个不同的接口。参见docs。可调用对象应采用三个参数:拟合估计量、要评分的数据 (X) 和已知的真实目标 (y)。
要提供功能,您需要 make_scorer 包装器
import sklearn.metrics
scorefunc = sklearn.metrics.accuracy_score # Replace with custom
myscorer = sklearn.metrics.make_scorer(
scorefunc,
greater_is_better=True,
needs_threshold=False # ... classification
)
LogisticRegressionCV(... scoring=myscorer,)
...作为旁注,如果 sklearn 的 LogisticRegression 主要是回归,并且新的 LogisticClassification class 包装了它,那就太好了。目前无法提供回归误差或提供实值目标。 (据我所知)
我正在尝试从 scikit-learn 0.16 实现 LogisticRegressionCV class,但很难让它与不同的评分函数一起工作。文档说要从 sklearn.metrics 传递评分函数之一,所以我尝试了以下代码:
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import log_loss
...
model_regression = LogisticRegressionCV(scoring=log_loss)
model_regression.fit(data_combined, winners_losers)
但是我在 fit 函数上得到以下错误:
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line 1381, in fit
for label in iter_labels
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 659, in __call__
self.dispatch(function, args, kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 406, in dispatch
job = ImmediateApply(func, args, kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 140, in __init__
self.results = func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line 844, in _log_reg_scoring_path
scores.append(scoring(log_reg, X_test, y_test))
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 1403, in log_loss
T = lb.fit_transform(y_true)
File "C:\Anaconda3\lib\site-packages\sklearn\base.py", line 433, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 315, in fit
self.y_type_ = type_of_target(y)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 287, in type_of_target
'got %r' % y)
ValueError: Expected array-like (array or non-string sequence), got LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
我在这里做错了什么?如果没有 'scoring=log_loss' 参数,函数就可以正常工作,所以它必须与我传递函数的方式有关?
应该是scoring="neg_log_loss"
,字符串,不是函数。如果你想传递一个可调用对象,它需要有一个不同的接口。参见docs。可调用对象应采用三个参数:拟合估计量、要评分的数据 (X) 和已知的真实目标 (y)。
要提供功能,您需要 make_scorer 包装器
import sklearn.metrics
scorefunc = sklearn.metrics.accuracy_score # Replace with custom
myscorer = sklearn.metrics.make_scorer(
scorefunc,
greater_is_better=True,
needs_threshold=False # ... classification
)
LogisticRegressionCV(... scoring=myscorer,)
...作为旁注,如果 sklearn 的 LogisticRegression 主要是回归,并且新的 LogisticClassification class 包装了它,那就太好了。目前无法提供回归误差或提供实值目标。 (据我所知)