在 sklearn python 中计算均方误差时遇到问题

Question

我正在尝试将决策树回归器拟合到数据集，它正在运行，但是当我通过计算均方误差来测试它时。我收到如下所示的错误：

msee = mse(x_test, y_test)

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17480/3348210221.py in <module>
----> 1 msee = mse(x_test, y_test)

    ~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in mean_squared_error(y_true, y_pred, sample_weight, multioutput, squared)
        436     0.825...
        437     """
    --> 438     y_type, y_true, y_pred, multioutput = _check_reg_targets(
        439         y_true, y_pred, multioutput
        440     )
    
    ~\anaconda3\lib\site-packages\sklearn\metrics\_regression.py in _check_reg_targets(y_true, y_pred, multioutput, dtype)
        103 
        104     if y_true.shape[1] != y_pred.shape[1]:
    --> 105         raise ValueError(
        106             "y_true and y_pred have different number of output ({0}!={1})".format(
        107                 y_true.shape[1], y_pred.shape[1]
    
    ValueError: y_true and y_pred have different number of output (4!=1)

这是模型代码和我正在训练模型的df的负责人：

x = np.array(bat[["TB_x"]])
    y = np.array(bat[["TB_y"]])
    
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= .2, random_state= 1)
    dt = DecisionTreeRegressor(max_depth= 10, random_state= 1, min_samples_leaf=.1)
    dt.fit(x_train.reshape(-1,1), y_train.reshape(-1,1))
    y_pred = dt.predict(x_test)


    index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   G_x PA_x    AB_x    ... BA_y    OBP_y   SLG_y   OPS_y   TB_y    GDP_y   HBP_y   SH_y    SF_y    IBB_y
0   19  2019    22.0    1.5 UCLA    P12 NCAA    38.0    72.0    58.0    ... 0.179   0.364   0.194   0.558   13.0    0.0 1.0 2.0 1.0 0.0
2   24  2020    23.0    1.7 St. Leo SSC NCAA    20.0    86.0    69.0    ... 0.156   0.309   0.219   0.527   14.0    0.0 2.0 0.0 2.0 0.0
6   45  2020    20.0    -0.8    Illinois    BTen    NCAA    13.0    58.0    47.0    ... 0.200   0.343   0.288   0.631   23.0    1.0 1.0 0.0 1.0 0.0
7   46  2020    20.0    -0.8    Illinois    BTen    NCAA    13.0    58.0    47.0    ... 0.156   0.309   0.219   0.527   14.0    0.0 2.0 0.0 2.0 0.0
8   49  2020    21.0    0.3 Miami (FL)  ACC NCAA    16.0    69.0    54.0    ... 0.200   0.343   0.288   0.631   23.0    1.0 1.0 0.0 1.0 0.0

Answer 1

来自 documentation:

Parameters y_true array-like of shape (n_samples,) or (n_samples, n_outputs) Ground truth (correct) target values.

y_pred array-like of shape (n_samples,) or (n_samples, n_outputs) Estimated target values.

因此，您需要输入真实值和预测值 y-values:

，而不是输入 x_test 和 y_test

 y_pred = dt.predict(x_test)
 mse(y_test, y_pred)

或

mse(y_test, dt.predict(x_test))

在 sklearn python 中计算均方误差时遇到问题

Having trouble calculating mean squared error in sklearn python

python

numpy

decision-tree

scikit-learn