python 中的逻辑回归测试输入格式帮助

Question

我有以下数据集。

我已经从中创建了 逻辑回归 并检查了准确性并且工作正常。所以现在的要求是我有一个 Age 30 和 EstimatedSalary 50000 的新数据，我想预测 Purchased 是 0 还是 1。如何在我的 python 代码中传递新值 30 和 50000。

下面是我用过的 python 代码。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
%matplotlib inline

dataset = pd.read_csv(r"suv_data.csv")

X=dataset.iloc[:,[0,1]].values
y=dataset.iloc[:,2].values

X_train,X_test,y_train,y_test=train_test_split(X, y, test_size=0.2, random_state=1)

sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.transform(X_test)

classifier=LogisticRegression(random_state=0)
classifier.fit(X_train,y_train)

y_pred=classifier.predict(X_test)

accuracy_score(y_test,y_pred)*100

此致，

巴拉斯·维卡斯

Answer 1

一般来说，要评估（即在 sklearn 中调用 .predict）训练模型，您需要输入与训练模型的样本具有相同形状的样本。

在你的情况下，我想（见我对你的问题的评论）你想在训练集中使用 Purchased 作为标签使用 Age 和 EstimatedSalary 的样本。

然后，要在单个样本上进行测试，只需尝试以下操作：

single_test_sample = pd.DataFrame({'Age':[30], 'EstimatedSalary':[50000]}).iloc[:,[0,1]].values
single_test_sample = sc.transform(single_test_sample)
single_test_prediction = classifier.predict(single_test_sample)

请注意，您还可以在测试数据框 Age 和 EstimatedSalary 列中添加更多值，现在我只添加了您感兴趣的样本。如果添加更多，模型将输出测试数据框中每一行的预测。

另请注意，您的代码和我的代码也可以在 train/test 集末尾没有这个 .values 工作，因为 sklearn 已经提供了 pandas 数据帧的功能。

Answer 2

您的问题不清楚，但我了解到您需要使用拟合模型来预测新样本。

安装好你的模型后，使用这个：

new_sample = np.array([[30,50000]]) # 2D numpy array

new_sample_sc = sc.transform(new_sample)

y_pred_new = classifier.predict(new_sample_sc)
print(y_pred_new)

python 中的逻辑回归测试输入格式帮助

Logistic Regression test input format help in python

python

machine-learning

pandas

scikit-learn

logistic-regression