基于测试集中数据组的线性回归预测
Linear regression prediction based on group of data in test set
我有一个简单的数据集,如下所示:
v1 v2 v3 hour_day sales
3 4 24 12 133
5 5 13 12 243
4 9 3 3 93
5 12 5 3 101
4 9 3 6 93
5 12 5 6 101
我创建了一个简单的 LR 模型来训练和预测目标变量“销售额”。我用MAE评估模型
# Define the input and target features
X= df.iloc[:,[0,1, 2, 3]]
y = df.iloc[:, 4]
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Train and fit the model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Make prediction
y_pred = regressor.predict(X_test)
# Evaluate the model
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
我的代码运行良好,但我想做的是预测 X_test 中按一天中的小时分组的销售额。
在上面的数据集示例中,有三个小时时段,12、3 和 6。因此输出应如下所示:
MAE for hour 12: 18.29
MAE for hour 3: 11.67
MAE for hour 6: 14.43
我觉得应该用for循环来迭代。可能是这样的:
# Save Hour Vector
hour_vec = deepcopy(X_test['hour_day'])
for i in range(len(X_test)):
y_pred = regressor.predict(np.array([X_test[i]])
知道如何执行吗?
hours = list(set(X_test['hour_day']))
results = pd.DataFrame(index=['MAE'], columns=hours)
for hour in hours:
idx = X_test['hour_day'] == hour
y_pred_h = regressor.predict(X_test[idx])
mae = metrics.mean_absolute_error(y_test[idx], y_pred_h)
results.loc['MAE', hour] = mae
results.loc['MAE', 'mean'] = results.mean(axis=1)[0]
print(results)
打印
3 6 mean
MAE 71.405775 71.405775 71.405775
我有一个简单的数据集,如下所示:
v1 v2 v3 hour_day sales
3 4 24 12 133
5 5 13 12 243
4 9 3 3 93
5 12 5 3 101
4 9 3 6 93
5 12 5 6 101
我创建了一个简单的 LR 模型来训练和预测目标变量“销售额”。我用MAE评估模型
# Define the input and target features
X= df.iloc[:,[0,1, 2, 3]]
y = df.iloc[:, 4]
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Train and fit the model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Make prediction
y_pred = regressor.predict(X_test)
# Evaluate the model
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
我的代码运行良好,但我想做的是预测 X_test 中按一天中的小时分组的销售额。 在上面的数据集示例中,有三个小时时段,12、3 和 6。因此输出应如下所示:
MAE for hour 12: 18.29
MAE for hour 3: 11.67
MAE for hour 6: 14.43
我觉得应该用for循环来迭代。可能是这样的:
# Save Hour Vector
hour_vec = deepcopy(X_test['hour_day'])
for i in range(len(X_test)):
y_pred = regressor.predict(np.array([X_test[i]])
知道如何执行吗?
hours = list(set(X_test['hour_day']))
results = pd.DataFrame(index=['MAE'], columns=hours)
for hour in hours:
idx = X_test['hour_day'] == hour
y_pred_h = regressor.predict(X_test[idx])
mae = metrics.mean_absolute_error(y_test[idx], y_pred_h)
results.loc['MAE', hour] = mae
results.loc['MAE', 'mean'] = results.mean(axis=1)[0]
print(results)
打印
3 6 mean
MAE 71.405775 71.405775 71.405775