在 sklearn python 中计算均方误差时遇到问题
Having trouble calculating mean squared error in sklearn python
我正在尝试将决策树回归器拟合到数据集,它正在运行,但是当我通过计算均方误差来测试它时。我收到如下所示的错误:
msee = mse(x_test, y_test)
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17480/3348210221.py in <module>
----> 1 msee = mse(x_test, y_test)
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
436 0.825...
437 """
--> 438 y_type, y_true, y_pred, multioutput = _check_reg_targets(
439 y_true, y_pred, multioutput
440 )
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
103
104 if y_true.shape[1] != y_pred.shape[1]:
--> 105 raise ValueError(
106 "y_true and y_pred have different number of output ({0}!={1})".format(
107 y_true.shape[1], y_pred.shape[1]
ValueError: y_true and y_pred have different number of output (4!=1)
这是模型代码和我正在训练模型的df的负责人:
x = np.array(bat[["TB_x"]])
y = np.array(bat[["TB_y"]])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
dt = DecisionTreeRegressor(max_depth= 10, random_state= 1, min_samples_leaf=.1)
dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
y_pred = dt.predict(x_test)
index Year Age_x AgeDif_x Tm_x Lg_x Lev_x G_x PA_x AB_x ... BA_y OBP_y SLG_y OPS_y TB_y GDP_y HBP_y SH_y SF_y IBB_y
0 19 2019 22.0 1.5 UCLA P12 NCAA 38.0 72.0 58.0 ... 0.179 0.364 0.194 0.558 13.0 0.0 1.0 2.0 1.0 0.0
2 24 2020 23.0 1.7 St. Leo SSC NCAA 20.0 86.0 69.0 ... 0.156 0.309 0.219 0.527 14.0 0.0 2.0 0.0 2.0 0.0
6 45 2020 20.0 -0.8 Illinois BTen NCAA 13.0 58.0 47.0 ... 0.200 0.343 0.288 0.631 23.0 1.0 1.0 0.0 1.0 0.0
7 46 2020 20.0 -0.8 Illinois BTen NCAA 13.0 58.0 47.0 ... 0.156 0.309 0.219 0.527 14.0 0.0 2.0 0.0 2.0 0.0
8 49 2020 21.0 0.3 Miami (FL) ACC NCAA 16.0 69.0 54.0 ... 0.200 0.343 0.288 0.631 23.0 1.0 1.0 0.0 1.0 0.0
来自 documentation:
Parameters y_true array-like of shape (n_samples,) or (n_samples,
n_outputs) Ground truth (correct) target values.
y_pred array-like of shape (n_samples,) or (n_samples, n_outputs)
Estimated target values.
因此,您需要输入真实值和预测值 y-values:
,而不是输入 x_test 和 y_test
y_pred = dt.predict(x_test)
mse(y_test, y_pred)
或
mse(y_test, dt.predict(x_test))
我正在尝试将决策树回归器拟合到数据集,它正在运行,但是当我通过计算均方误差来测试它时。我收到如下所示的错误:
msee = mse(x_test, y_test)
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17480/3348210221.py in <module>
----> 1 msee = mse(x_test, y_test)
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
436 0.825...
437 """
--> 438 y_type, y_true, y_pred, multioutput = _check_reg_targets(
439 y_true, y_pred, multioutput
440 )
~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
103
104 if y_true.shape[1] != y_pred.shape[1]:
--> 105 raise ValueError(
106 "y_true and y_pred have different number of output ({0}!={1})".format(
107 y_true.shape[1], y_pred.shape[1]
ValueError: y_true and y_pred have different number of output (4!=1)
这是模型代码和我正在训练模型的df的负责人:
x = np.array(bat[["TB_x"]])
y = np.array(bat[["TB_y"]])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
dt = DecisionTreeRegressor(max_depth= 10, random_state= 1, min_samples_leaf=.1)
dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
y_pred = dt.predict(x_test)
index Year Age_x AgeDif_x Tm_x Lg_x Lev_x G_x PA_x AB_x ... BA_y OBP_y SLG_y OPS_y TB_y GDP_y HBP_y SH_y SF_y IBB_y
0 19 2019 22.0 1.5 UCLA P12 NCAA 38.0 72.0 58.0 ... 0.179 0.364 0.194 0.558 13.0 0.0 1.0 2.0 1.0 0.0
2 24 2020 23.0 1.7 St. Leo SSC NCAA 20.0 86.0 69.0 ... 0.156 0.309 0.219 0.527 14.0 0.0 2.0 0.0 2.0 0.0
6 45 2020 20.0 -0.8 Illinois BTen NCAA 13.0 58.0 47.0 ... 0.200 0.343 0.288 0.631 23.0 1.0 1.0 0.0 1.0 0.0
7 46 2020 20.0 -0.8 Illinois BTen NCAA 13.0 58.0 47.0 ... 0.156 0.309 0.219 0.527 14.0 0.0 2.0 0.0 2.0 0.0
8 49 2020 21.0 0.3 Miami (FL) ACC NCAA 16.0 69.0 54.0 ... 0.200 0.343 0.288 0.631 23.0 1.0 1.0 0.0 1.0 0.0
来自 documentation:
Parameters y_true array-like of shape (n_samples,) or (n_samples, n_outputs) Ground truth (correct) target values.
y_pred array-like of shape (n_samples,) or (n_samples, n_outputs) Estimated target values.
因此,您需要输入真实值和预测值 y-values:
,而不是输入 x_test 和 y_test y_pred = dt.predict(x_test)
mse(y_test, y_pred)
或
mse(y_test, dt.predict(x_test))