Python statsmodel 输出和 Excel/Google Sheet 输出不匹配

Python statsmodel outpput and Excel/Google Sheet output doesn't match

我有一个小数据集,由于某种原因,输出与 Excel 的不匹配。

这是我所做的。我必须列:

Miles Traveled Travel Time
89 7.0
66 5.4
78 6.6
111 7.4
44 4.8
77 6.4
80 7.0
66 5.6
109 7.3
76 6.4

这是我在 Google Sheet:

上得到的输出
Slope Intercept
Coefficient 0.04025678079 3.185560249
Standard Error 0.005706415564 0.4669507938
R Squared, Standard Error 0.8615153295 0.3423088398
F Stat 49.76812677 8
Regression SS / Residual SS 5.831597265 0.9374027345

此输出也与 excel 输出匹配。

但是,当我在 statsmodel 上执行以下操作时:

milesTravelled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]

model = sm.OLS(travelTime, milesTraveled).fit()
print(model.summary())

我得到以下信息:

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:            Travel Time   R-squared (uncentered):                   0.985
Model:                            OLS   Adj. R-squared (uncentered):              0.983
Method:                 Least Squares   F-statistic:                              575.6
Date:                Mon, 01 Feb 2021   Prob (F-statistic):                    1.82e-09
Time:                        10:18:44   Log-Likelihood:                         -11.951
No. Observations:                  10   AIC:                                      25.90
Df Residuals:                       9   BIC:                                      26.20
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Miles Traveled     0.0781      0.003     23.991      0.000       0.071       0.085
==============================================================================
Omnibus:                        2.179   Durbin-Watson:                   2.654
Prob(Omnibus):                  0.336   Jarque-Bera (JB):                1.033
Skew:                          -0.777   Prob(JB):                        0.597
Kurtosis:                       2.741   Cond. No.                         1.00
==============================================================================

如您所见,标准误差、R 平方等的值与 Google Sheet/Excel 完全不匹配。我究竟做错了什么?我该怎么做才能获得准确的结果摘要,例如 Google Sheet/Excel?

默认情况下,OLS class 不包括线性模型中的常数项。您可以使用 sm.add_constantOLS:

创建适当的 exog 参数
In [36]: milesTraveled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]

In [37]: travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]

In [38]: X = sm.add_constant(milesTraveled)

In [39]: model = sm.OLS(travelTime, X).fit()

In [40]: print(model.summary())
/Users/warren/a2020.11/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.862
Model:                            OLS   Adj. R-squared:                  0.844
Method:                 Least Squares   F-statistic:                     49.77
Date:                Mon, 01 Feb 2021   Prob (F-statistic):           0.000107
Time:                        13:04:53   Log-Likelihood:                -2.3532
No. Observations:                  10   AIC:                             8.706
Df Residuals:                       8   BIC:                             9.312
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.1856      0.467      6.822      0.000       2.109       4.262
x1             0.0403      0.006      7.055      0.000       0.027       0.053
==============================================================================
Omnibus:                        0.542   Durbin-Watson:                   2.608
Prob(Omnibus):                  0.763   Jarque-Bera (JB):                0.554
Skew:                           0.370   Prob(JB):                        0.758
Kurtosis:                       2.115   Cond. No.                         353.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.