由于 "unknown label type 'continuous'"，逻辑回归不起作用？

Question

我正在尝试使用 Sklearn 实现逻辑回归。目前我有一个 Dataframe，它由 12 个输入变量和 1 个输出变量组成。

输出数据帧是二进制值，而其余 12 个变量不一定是二进制值。

示例输入数据的结构。

#PseudoCode (Y and X are pandas dataframes)
Y = 0, 1, 0, 1, 1, 1  # Output data
X =  A1: 1, 1, 2, 1, 2, 2 #Input Data
     B2: 45, 23, 12, 56, 23, 86
     ...
     L12: 4.2, 3.2, 1.2, 2.3, 2.3, 9.9

然后完成以下操作：

X = X.astype(int) # to make sure that the data is actually in int format.
Y = Y.astype(int)

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=.10, random_state = 42)

xscaler = StandardScaler()
yscaler = StandardScaler()

pipe = Pipeline([('scaler', xscaler), ('logit', LogisticRegression())]) 
model = TransformedTargetRegressor(regressor=pipe, transformer=yscaler)
model.fit(X_train,y_train)

然而，这会抛出以下内容：

ValueError: Unknown label type: 'continuous'

即使 Y 数据显然是二进制值，为什么会发生这种情况？

Answer 1

这里的问题是您正在使用 StandardScaler().

缩放标签 y

y 是一个分类变量，用于表示样本属于 class 1 或 0，因此不得对其进行缩放。

由于 "unknown label type 'continuous'"，逻辑回归不起作用？

Logistic Regression not working because of "unknown label type 'continuous'"?

python

pandas

scikit-learn

logistic-regression