为什么线性回归模型的目标可视化和预测显示不准确？

Question

使用多元线性回归模型估算吸烟者的医疗费用。我已经使用 'age'、'bmi'、'children' 功能来估算“费用”。下面是我的代码：

import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error as rmse

从 github 存储库中读取数据

smoker_df = pd.read_csv('https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv')

创建输入和目标

inputs  = smoker_df[['age', 'bmi', 'children']]
targets = smoker_df['charges']

创建和训练模型

model6 = LinearRegression().fit(inputs, targets)

生成预测

predictions = model6.predict(inputs)

计算损失以评估模型

loss = rmse(targets, predictions)
print('Loss:', loss)

预测和目标的可视化：

fig, ax = plt.subplots(figsize=(7, 3.5))

ax.plot(predictions, targets, color='k', label='Regression model')
ax.set_ylabel('predictions', fontsize=14)
ax.set_xlabel('targets', fontsize=14)
ax.legend(facecolor='white', fontsize=11)

这不是一个很好的可视化。我该如何改进它以便获得一些洞察力，以及如何将 3 个以上的特征可视化为以 1 个特征为目标的输入。

Data Source

Answer 1

您可以使用散点图来可视化您的预测与观察到的结果：

fig, ax = plt.subplots(figsize=(7, 3.5))

ax.scatter(predictions, targets)
ax.set_xlabel('prediction', fontsize=14)
ax.set_ylabel('charges', fontsize=14)
ax.legend(facecolor='white', fontsize=11)

你可以看到你的一些预测是错误的，这是因为你没有包括其他变量：

import seaborn as sns
sns.scatterplot(data=smoker_df,x = "age", y = "charges",hue="smoker")

您还可以查看您的其他特征与目标的相关性：

fig, ax = plt.subplots(1,3,figsize=(15, 5))

for i,x in enumerate(inputs.columns):
    ax[i].scatter(inputs[[x]], targets, label=x)
    ax[i].set_xlabel(x, fontsize=14)
    ax[i].set_ylabel('charges', fontsize=14)
    ax[i].legend(facecolor='white', fontsize=11)

plt.tight_layout()

为什么线性回归模型的目标可视化和预测显示不准确？

Why visualization of targets and prediction is not showing accurately for Linear Regression Model?

python

machine-learning

matplotlib

linear-regression

python-3.x