如何从 scikit-learn 中的逻辑回归模型手动“predict_proba”？

Question

我正在尝试使用 scikit-learn 模型的系数和截距输出手动预测逻辑回归模型。但是，我无法将我的概率预测与分类器中的 predict_proba 方法相匹配。

我试过：

from sklearn.datasets import load_iris
from scipy.special import expit
import numpy as np

X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
# use sklearn's predict_proba function
sk_probas = clf.predict_proba(X[:1, :])
# and attempting manually (using scipy's inverse logit)
manual_probas = expit(np.dot(X[:1], clf.coef_.T)+clf.intercept_)
# with a completely manual inverse logit
full_manual_probas = 1/(1+np.exp(-(np.dot(iris_test, iris_coef.T)+clf.intercept_)))

输出：

>>> sk_probas
array([[9.81815067e-01, 1.81849190e-02, 1.44120963e-08]])
>>> manual_probas
array([[9.99352591e-01, 9.66205386e-01, 2.26583306e-05]])
>>> full_manual_probas
array([[9.99352591e-01, 9.66205386e-01, 2.26583306e-05]])

我似乎确实让类匹配（使用 np.argmax），但概率不同。我错过了什么？

我看过 this and this，但还没想明白。

Answer 1

documentation 指出

For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class

也就是说，为了获得与 sklearn 相同的值，您必须使用 softmax 进行归一化，如下所示：

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import numpy as np

X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0, max_iter=1000).fit(X, y)
decision = np.dot(X[:1], clf.coef_.T)+clf.intercept_
print(clf.predict_proba(X[:1]))
print(np.exp(decision) / np.exp(decision).sum())

要改用 sigmoid，您可以这样做：

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import numpy as np

X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0, max_iter=1000, multi_class='ovr').fit(X, y) # Notice the extra argument
full_manual_probas = 1/(1+np.exp(-(np.dot(X[:1], clf.coef_.T)+clf.intercept_)))
print(clf.predict_proba(X[:1]))
print(full_manual_probas / full_manual_probas.sum())

如何从 scikit-learn 中的逻辑回归模型手动“predict_proba”？

How do I manually `predict_proba` from logistic regression model in scikit-learn?

python

numpy

machine-learning

scikit-learn