如何使用 multiclass 逻辑回归模型的系数来预测观测值 class 隶属度的概率?

How can the coefficients of the multiclass logistic regression model be used to predict the probabilities of class membership of observations?

我正在尝试解决一个类似于 Fisher 虹膜 classification 的问题。问题是我可以在我的计算机上训练模型,但是给定模型必须在无法安装 python 和 scikit 学习的计算机上预测 class 成员资格。我想了解如何在收到逻辑回归模型的系数后,在不使用模型的预测方法的情况下预测属于某个 class。 以 Fisher 问题为例,我执行以下操作。

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score, f1_score

# data preparation
iris = load_iris()
data = pd.DataFrame(data=np.hstack([iris.data, iris.target[:, np.newaxis]]), 
                    columns=iris.feature_names + ['target'])
names = data.columns

# split data
X_train, X_test, y_train, y_test = train_test_split(data[names[:-1]], data[names[-1]], random_state=42)

# train model
cls = make_pipeline(
    StandardScaler(),
    LogisticRegression(C=2, random_state=42)
)
cls = cls.fit(X_train.to_numpy(), y_train)
preds_train = cls.predict(X_train)

# prediction
preds_test = cls.predict(X_test)

# scores
train_score = accuracy_score(preds_train, y_train), f1_score(preds_train, y_train, average='macro') # on train data
# train_score = (0.9642857142857143, 0.9653621232568601)
test_score = accuracy_score(preds_test, y_test), f1_score(preds_test, y_test, average='macro') # on test data
# test_score = (1.0, 1.0)

# model coefficients
cls[1].coef_, cls[1].intercept_

>>> (array([[-1.13948079,  1.30623841, -2.21496793, -2.05617771],
            [ 0.66515676, -0.2541143 , -0.55819748, -0.86441227],
            [ 0.47432404, -1.05212411,  2.77316541,  2.92058998]]),
      array([-0.35860337,  2.43929019, -2.08068682]))

现在我有了模型的系数。我想用它们来做出预测。 首先,我使用 predict 方法对测试样本的前五个观察值进行预测。

preds_test = cls.predict_proba(X_test)
preds_test[0:5]

>>>array([[5.66019001e-03, 9.18455687e-01, 7.58841233e-02],
          [9.75854479e-01, 2.41455095e-02, 1.10881450e-08],
          [1.18780156e-09, 6.53295166e-04, 9.99346704e-01],
          [6.71574900e-03, 8.14174200e-01, 1.79110051e-01],
          [6.98756622e-04, 8.09096425e-01, 1.90204818e-01]])

然后我使用模型的系数手动计算观测值的 class 概率的预测。

# define two functions for making predictions
def logit(x, w):
    return np.dot(x, w)

# from here: 
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

n, k = X_test.shape
X_ = np.hstack((np.ones((n, 1)), X_test)) # add column with 1 for intercept
weights = np.hstack((cls[1].intercept_[:, np.newaxis], cls[1].coef_)) # create weights matrix
results = softmax(logit(X_, weights.T)) # calculate probabilities 
results[0:5]

>>>array([[3.67343725e-14, 4.63938438e-06, 9.99995361e-01],
          [2.81976786e-05, 8.63083152e-01, 1.36888650e-01],
          [1.24572182e-22, 5.47800683e-11, 1.00000000e+00],
          [3.32990060e-14, 3.08352323e-06, 9.99996916e-01],
          [2.66415118e-15, 1.78252465e-06, 9.99998217e-01]])

如果你比较得到的两个结果(preds_test[0:5]和results[0:5]),你会发现它们根本不重合。请解释我做错了什么,以及如何在不使用预测方法的情况下使用模型的系数来计算预测。

我忘记应用了定标器。如果你稍微改变一下代码,那么结果是一样的。

scaler = StandardScaler()
scaler.fit(X_train)
X_test_transf = scaler.transform(X_test)

def logit(x, w):
    return np.dot(x, w)

def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

n, k = X_test_transf.shape
X_ = np.hstack((np.ones((n, 1)), X_test_transf))
weights = np.hstack((cls[1].intercept_[:, np.newaxis], cls[1].coef_))
results = softmax(logit(X_, weights.T))
np.allclose(preds_test, results)

>>>True

每个 predict_proba 有两个值。第一个值是事件不发生的概率和事件发生的概率。 predict_proba(X)[:,1] 得到事件发生的概率。